Draft version March 4, 2013 

Preprint typeset using l^f^i style emulateapj v. 5/2/1 1 



THE EINSTEIN® HOME SEARCH FOR RADIO PULSARS AND PSR J2007+2722 DISCOVERY 
B. Allen^'^'^'"^, B. Knispel^'^, J. M. Cordes^, J. S. Deneva^, J. W. T. Hessels^'^, D. Anderson^, C. Aulbert^'^, O. Bock^'^, 

A. BRAZIER^'^^, S. CHATTERJEE^ P. B. DEMOREST^^ H. B. EGGENSTEIN^'^, H. FEHRMANN^'^, E, V. GOTTHELF^^, D. HAMMER^ 

V. M. KASPI^^ M. Kramer^"^, a. G. Lyne^^, B. Machenschalk^'^, M. A. McLaughlin C. Messenger^'^, H. J. Pletsch^'^, 

S. M. RANS0M^\ I. H. STAIRS^^ B. W. STAPPERS^^ N. D. R. BHAT^^ S. BOGDANOV^^ R CAMILo'^'^ D. J. CHAMPION^"^, 
F. CRAWFORD^^, G. DESVIGNES^^, p. C. C. FREIRE^"^, G. HEALD^, R A. JENET^\ P LAZARUS^"^, K. J. LEE^"^, J. van LEEUWEN^'^ 
R. LYNCH^^ M. A. PAPA^'^'^ R. PRIX^'^, R. ROSEN^^, P SCHOLZ^^ X. SIEMENS\ K. ST0VALL^\ A. VENKATARAMAN^ , W. ZHU^^ 

Draft version March 4, 2013 

Abstract 

Einstein @ Home aggregates the computer power of hundreds of thousands of volunteers from 192 countries, 
to search for new neutron stars using data from electromagnetic and gravitational- wave detectors. This paper 
presents a detailed description of the search for new radio pulsars using PALFA survey data from the Arecibo 
Observatory. The enormous computing power allows this search to cover a new region of parameter space; 
it can detect pulsars in binary systems with orbital periods as short as 1 1 min. We also describe the first 
Einstein @ Home discovery, the 40.8 Hz isolated pulsar PSR J2007+2722, and provide a full timing model. 
PSR J2007+2722's pulse profile is remarkably wide with emission over almost the entire spin period. This 
neutron star is most likely a disrupted recycled pulsar, about as old as its characteristic spin-down age of 
404 Myr. However there is a small chance that it was born recently, with a low magnetic field. If so, upper 
limits on the X-ray flux suggest but can not prove that PSR J2007+2722 is at least ~ 500 kyr old. In the 
future, we expect that the massive computing power provided by volunteers should enable many additional 
radio pulsar discoveries. 

Subject headings: pulsars, radio pulsars, volunteer distributed computing, PSR J2007+2722 



1. INTRODUCTION 

Einstein@Home is an on-going volunteer distributed com- 
puting project (Anderson et al. 2006), launched in early 2005. 
More than a quarter-million members of the general public 
have "signed up" their laptop and desktop computers. When 
otherwise idle, these computers download observational data 
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from the Einstein@Home servers, search the data for weak as- 
trophysical signals, and return the results of the analysis. The 
collective computing power is on par with the largest super- 
computers in the world. 

The goal of Einstein@Home is to discover neutron stars, us- 
ing data from an international network of gravitational- wave 
(GW) detectors (Sathyaprakash & Schutz 2009), from radio 
telescopes (Lyne & Graham-Smith 1998; Lorimer & Kramer 
2004), and from the Large Area Telescope (Atwood et al. 
2009) gamma-ray detector aboard the Fermi Satellite. Be- 
cause the expected signals are weak, and the source parame- 
ters^^ are unknown, the sensitivity of the gravitational-wave 
searches (Brady et al. 1998; Brady & Creighton 2000) the ra- 
dio pulsar searches (Brooke et al. 2007), and the gamma-ray 
searches (Pletsch & Allen 2009; Pletsch et al. 2012b,c,a) are 
limited by the available computing power. 

Before 2009, Einstein@Home only searched data from the 
Laser Interferometer Gravitational- wave Observatory (LIGO) 
(Abramovici et al. 1992; Barish & Weiss 1999; Abbott et al. 
2009). So far these searches have not found any sources, 
but have set new and more sensitive upper limits on possi- 
ble continuous gravitational- wave (CW) emissions (Abbott 
et al. 2009b; Abbott et al. 2009). These searches are ongoing, 
with increasing sensitivity arising from improved data anal- 
ysis methods (Pletsch & Allen 2009) and better-quality data 
(Smith & LSC 2009). 

In 2009, Einstein@Home also began searching radio survey 
data from the Arecibo telescope in Puerto Rico: the world's 
largest and most sensitive radio telescope. Beginning in De- 
cember 2010 a similar search using data from Parkes was also 
started; the differences from the Arecibo search and some re- 

23 Depending upon the type of search, these unknown parameters might 
include the sky position, spin frequency, spin-down rate, orbital parameters, 
etc. 
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suits are described in Knispel et al. (2013). 

Starting in summer 2011, Einstein@Home also began a 
search for isolated gamma-ray pulsars in data from the Fermi 
SatelHte's Large Area Telescope Atwood et al. (2009); this 
will be described in future publications. 

The Arecibo data are collected by the Pulsar ALFA 
(PALFA) Consortium using the Arecibo L-band Feed Array 
(ALFA^^). ALFA is a cryogenically cooled, seven feed-horn, 
dual-polarization receiver operating at 1.4 GHz (Cordes et al. 
2006); for pulsar searches its output is fed into fast, broad- 
band autocorrelation spectrometers (Dowd et al. 2000) called 
Wideband Arecibo Pulsar Processors (WAPPs). After April 
2009 the WAPPs were replaced by even broader-bandwidth 
higher-resolution "Mock" spectrometers (Mock 2007). 

The computing capacity of Einstein® Home is used to 
search the spectrometer output for signals from neutron stars 
in short-period orbits around companion stars. This is a 
poorly-explored region of parameter space, where other radio- 
pulsar search pipelines lose much or most of their sensitivity. 
The detection of these pulsars with standard Fourier methods 
is hampered by Doppler smearing of the pulsed signal caused 
by binary motion during the survey observation (Johnston & 
Kulkarni 1991). 

Previous searches (Johnston & Kulkarni 1991; Anderson 
et al. 1990; Camilo et al. 2000) have utilized "acceleration 
searches" to correct for the lowest-order modulation arising 
from the binary motion, which may be modeled as a constant 
acceleration along the line-of-sight. Although computation- 
ally efficient, acceleration techniques are only effective when 
the observation time is a small fraction of the orbital period, 
and where the acceleration is therefore roughly constant over 
the duration of the observation. Thus, these techniques are not 
sensitive to the most compact systems (Ransom et al. 2002). 
In contrast, the computing power of Einstein® Home enables 
a full demodulation to be carried out, giving substantially in- 
creased sensitivity to signals from pulsars in compact circular 
orbits with periods below ~ 1 h. 

In August 2010, Einstein@Home announced its first dis- 
covery of a new neutron star (Knispel et al. 2010) which ap- 
pears to be the fastest- spinning "disrupted recycled pulsar" 
(DRP) so far found (Belczynski et al. 2010). In the same 
month, Einstein@Home also discovered a 48 Hz pulsar in a 
binary system (Knispel et al. 2011). Further Einstein@Home 
discoveries in Parkes Multi-Beam Pulsar Survey (PMPS) are 
described in Knispel et al. (2013). As of January 2013, Ein- 
stein® Home has discovered almost fifty radio pulsars. 

This paper has two purposes. First, it provides a full de- 
scription of the Einstein@Home radio pulsar search and post- 
processing pipeline. Second, it provides a detailed description 
and full timing solution for the first Einstein@Home discov- 
ery, the 40.8 Hz pulsar PSR J2007+2722 (Knispel et al. 2010). 

The paper is structured as follows. Section 2 presents a 
general description of the Einstein@Home computing project, 
including its motivation, its history, and its technical design 
and structure. Section 3 is an brief overview of the PALFA 
survey, including its history, the data taking rates, and data 
acquisition system. Section 4 is a detailed technical descrip- 
tion of the Einstein® Home search for radio pulsars, start- 
ing from the centralized data preparation, through the dis- 
tributed processing on volunteer's computers, and centralized 
post-processing. Section 5 describes the discovery of the first 
Einstein@Home radio pulsar, PSR J2007+2722. Section 6 is 

http : / / www . naic . edu/ alf a/ 



about the subsequent follow-up investigations and studies, in- 
cluding observations at multiple frequencies, and accurate de- 
termination of the sky position through gridding and timing. 
We also discuss the evolutionary origin of PSR J2007-H2722. 
This is followed in Section 7 by a short discussion and con- 
clusion. 

Unless otherwise stated, all coordinates in this paper are in 
the J2000 coordinate system, and c denotes the speed of light. 

2. THE EINSTEIN@HOME DISTRIBUTED COMPUTING PROJECT 

2. 1 . Volunteer Distributed Computing 

The basic motivation for volunteer distributed computing is 
simple: the aggregate computing power owned by the general 
public exceeds that of universities, and public and private re- 
search laboratories, by two to three orders-of-magnitude. Sci- 
entific research whose progress is limited or constrained by 
computing can benefit from access to even a small fraction of 
these resources. This type of research includes both numer- 
ical simulation and Monte-Carlo-type exploration of param- 
eter spaces, that make no (direct) use of observational data, 
and data-mining and data-analysis efforts which perform deep 
searches through (potentially very large) observational data 
sets. 

Worldwide, there are more than one billion Personal Com- 
puters (PCs) which are connected to the Internet. These PCs 
typically contain x86-architecture Central Processor Units 
(CPUs) manufactured by Intel or AMD, with two or more 
cores. Each core can perform four floating-point computa- 
tions (two adds, two multiplies) per clock cycle. They typi- 
cally have one Gigabyte (GB) or more Random Access Mem- 
ory (RAM), and storage devices (rotating or solid-state disks) 
with hundreds of GB of storage. Many of these systems also 
contain Graphics Processor Units (GPUs) which can perform 
floating point calculations one to two orders-of-magnitude 
faster than a modern CPU core. 

The raw computational capacity of each of these consumer 
computers is similar to that of the systems used as building 
blocks for computer clusters or research supercomputers. In 
fact modem research computers are made possible only by the 
economies of scale of the consumer marketplace, which en- 
sures that the basic components are inexpensive and widely 
available. But research machines typically consist of hun- 
dreds or thousands of these CPUs; volunteer distributed com- 
puting offers access to hundreds of thousands or millions of 
these CPUs. 

2.2. Constraints on suitable computing problems 

Volunteer distributed computing is only a suitable solution 
for some computing and data analysis problems: there are 
both social and technical constraints. To attract volunteers, 
the research must resonate with the "person in the street". It 
must have clear and understandable goals that appeal to the 
general public and that excite and maintain interest. Experi- 
ence shows that at least four areas have these qualities: med- 
ical research, mathematics, climate/environmental science, 
and astronomy and astrophysics. 

The technical constraints arise because the computers are 
only connected by the public Internet. This is very differ- 
ent than research supercomputers, which typically have low- 
latency high-speed networks which enable any CPU to access 
data from any other CPU with nanosecond latencies and GB/s 
bandwidth. In contrast, the latency in volunteer distributed 
computing can be fifteen orders-of-magnitude larger; a vol- 
unteer's computer may only connect to the Internet once per 
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week! The average available bandwidth is also much smaller, 
particularly for data distributed from a central (project) loca- 
tion. For example if a project is distributing data through a 
IGb/s public Internet connection to 100k host machines, the 
average bandwidth available per host is 10 kb/s, six orders-of- 
magnitude less than for a research facility. 

The main technical constraints on the computing problem 
are therefore as follows: 

(I) It must He in the class of so called "embarrassingly paral- 
lel" problems, whose solution requires no communication or 
dependency between hosts. 

(II) It must have a high ratio of computation to Input/Output. 
For example if the project distributes data through a single 
IGb/s network connection, and the application requires 1 MB 
of data per CPU-core-hour, then at most 360k host CPU-cores 
can be kept fully-occupied on a 24 x 7 basis. 

(III) It must use only a small fraction of available RAM (say 
100 MB) so that the Operating System (OS) can quickly swap 
tasks, providing normal interactive computer response for vol- 
unteers. 

(IV) It must be capable of frequent and lightweight check- 
pointing (saving the internal state for later restart) using only 
a small amount of total storage (say 10 MB), so that it can 
snatch idle compute cycles but stop processing when the vol- 
unteer is using the computer or turns the computer off. 

(V) The code that will run on volunteer's hosts must be ma- 
ture code, not "development" code. This is because it must be 
ported to several different OSs, and then run reliably on vol- 
unteers computers. Months of work are required for porting, 
code optimization, and tuning. This investment only makes 
sense if the core of the scientific code is stable and robust. 
If the scientific code is not mature, much of the other work 
needs to be repeated anew. 

In short, volunteer distributed computing is not a panacea: it 
can only be used to solve some computing problems. 

2.3. Trends in computing power and GPUs 

The latest trend in computing is the move to systems con- 
taining large numbers of processing cores. This is largely in 
response to the fundamental physical limits that arise in man- 
ufacturing integrated circuits. For more than forty years, the 
computing power available at fixed cost has doubled every 18 
months. This was a consequence of "Moore's law", a heuristic 
observation that the number of components on an integrated 
circuit grew exponentially with time. For the past forty years 
this trend was made possible by the shrinking of the "process 
size" (the size of the smallest components on an integrated cir- 
cuit) along with a corresponding increase in clock speed and 
a decrease in operating voltage. Operating voltages can no 
longer be decreased because they have approached the band- 
gap energy, and process sizes, currently at 22nm or 32nm, 
have been shrinking more slowly than in the past. They are 
expected to decrease to about lOnm, but can not get much 
smaller; the inter-atomic spacing in a silicon lattice is 0.7nm. 
To get more computing power at reasonable cost, the only ap- 
proach is to put large numbers of cores onto a single chip. 

Fortunately the consumer marketplace has a demand for 
such systems: they are called GPUs and are used for high- 
quality rendering of graphics and video. The evolution of 
television from radio broadcasting to transmission over the 
Internet is now underway, and it is expected that over the 
next decade this will be an important driving force behind 
further growth in Internet capacity and graphics capability 
in consumer computers. Already more than 25% of Ein- 



stein® Home host machines contain GPUs, and we expect that 
this will approach 100% within the coming three years. 

Current-generation GPUs have 500 or more cores, each 
capable of simultaneously doing one floating-point addition 
and one floating-point multiplication per clock cycle. The 
two leading manufacturers of such systems (NVIDIA and 
AMD/ATI) both provide Application Programming Interfaces 
(APIs) that permit GPUs to be used for general-purpose com- 
puting. Thus, over the coming decade, if GPU capacity is ac- 
cessed and exploited, volunteer distributed computing should 
continue to provide "Moore's law scaling" and to provide 
access to more computing cycles than more traditional ap- 
proaches. 

In the longer-term, tablet devices like iPads and Kindle and 
smartphones will probably provide the bulk of the comput- 
ing power. Their CPUs and GPUs are typically an order-of- 
magnitude slower than laptop or desktop computers, however 
very large numbers are being marketed and used. These de- 
vices often spend a significant fraction of the time idle but 
connected to charging stations; during this time they repre- 
sent a significant computing resource. 

2.4. The Einstein@Home Project 

Einstein@ Home was formally launched at the American 
Association for the Advancement of Science meeting on 
February 19, 2005, as as one of the cornerstone activities 
of the World Year of Physics 2005 (Stone 2004). Mem- 
bers of the general public, whom we refer to as "volun- 
teers", "sign up" by visiting the project web site http: 
//einstein . phys .uwm.edu and downloading a small 
executable, which is available for Windows, Mac and Linux 
platforms. It takes a couple of minutes to install on a typi- 
cal home computer or laptop (which is then technically ref- 
ereed to as a "host"). After that, when the host is otherwise 
idle, it downloads observational astrophysics data from one 
of the Einstein@Home servers, and analyzes it in the back- 
ground, searching for signals. The results of the analysis are 
automatically uploaded to a project server, and more work is 
requested. The system is designed to operate without further 
attention from the volunteer, although it is also highly con- 
figurable and can be tuned for specific needs if desired. The 
collective computing power is on par with the largest super- 
computers in the world. 

The Einstein® Home project also incorporates additional 
features intended to attract, inform, motivate and retain vol- 
unteers. These include message boards where volunteers can 
exchange messages with other volunteers and project person- 
nel and scientists; granting computing credits as a symbolic 
"reward" for successful computing work; the ability to form 
teams to compete for computing credits; informational web 
pages describing the science and results; and access to dy- 
namic web pages that allow volunteers to track the individual 
computing jobs done by their computers. 

There are a number of such volunteer computing projects 
world-wide. They search for signs of extra-terrestrial life 
(SETI@Home, Anderson et al. (2002)), study protein-folding 
(Folding @Home, Shirts & Pande (2000)), search for new 
drugs (Rosetta@Home, Cooper et al. (2010)), search for large 
Mersenne prime numbers (GIMPS^^), simulate the Earth's 
climate evolution (ClimatePrediction.net, Stainforth et al. 
(2005)) and so on. 

The home page of the Great Internet Mersenne Prime Search (GIMPS) 
is http : / /www. mersenne . org/. 
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Einstein@Home is one of the largest of these projects; to 
date, hosts registered by more than 330,000 people have re- 
turned valid results to Einstein® Home and have delivered 
more than one billion CPU hours. There are Einstein® Home 
volunteers from all 192 countries recognized by the United 
Nations; currently, more than 100,000 different computers, 
owned by more than 55,000 volunteers, contact the Ein- 
stein® Home servers each week, requesting work and upload- 
ing results. 

The aggregate computing power of Einstein® Home is 
shown in real-time on a public server status page^^. As of 
January 2013, it exceeds one Petaflop on a 24 x 7 basis; ac- 
cording to the current (November 2012) Top-500 list, there 
are only 23 computers on the planet which deliver more com- 
puting power. To help understand the scale, it is useful to pro- 
vide some cost comparisons. Simply providing the electrical 
power needed to support this amount of computation would 
cost $3-6M/year. The costs of hardware and administration 
would be substantially greater^^. (Note: at the time of the 
PSR J2007-F2722 discovery in August 2010, there were about 
250,000 registered volunteers, and Einstein® Home delivered 
about 200 Tflops of computing power.) 

The original and long-term goal of Einstein® Home is to 
search gravitational-wave data to find the continuous -wave 
signals emitted by rapidly-rotating neutron stars. The search 
is an integral part of the coordinated world-wide effort to 
make the first direct detections of gravitational waves. These 
were predicted by Einstein in 1916, but have never been di- 
rectly detected. A new generation of instruments, the Laser 
Interferometer Gravitational- wave Observatory (LIGO) in the 
USA, VIRGO in Italy, GEO in Germany, and the KAGRA 
Large-scale Cryogenic Gravitational- wave Telescope Project 
in Japan, offers the first realistic hopes of such a detection. 
Gravitational waves produced by rapidly spinning neutron 
stars are one of the four main targets for these detectors, but 
because the signals are weak, and the source parameters (sky 
position, frequency, spin-down rate, and so on) are not known, 
the sensitivity of the search is limited by the available compu- 
tational power (Brady et al. 1998; Brady & Creighton 2000). 

Einstein® Home has carried out and published the most 
sensitive "blind" all- sky searches using data from the best 
gravitational wave detectors. While these searches have not 
yet detected any signals, they continue to be a principle target 
of the project. Upper limits obtained from Einstein® Home 
have been published using data from the Laser Interferome- 
ter Gravitational- wave Observatory (LIGO) instrument's 4th 
and 5th science runs (S4 and S5) (Abbott et al. 2009b; Ab- 
bott et al. 2009; Aasi et al. 2012). Einstein®Home is also 
re-searching the full S5 and S6 data sets using a new method 
that has been proved optimal, for conventional assumptions 
about the nature of the instrumental and environmental noise 
sources (Pletsch & Allen 2009; Pletsch 2010, 2011). 

Since 2009, Einstein® Home has also been searching radio- 

The Einstein @ Home server status page gives a real-time display of the 
number of active hosts, the number of active volunteers, and the average 
CPU power. It may be found at http : //einstein .phys . uwm.edu/ 
server_status . html. 

One can use the Amazon Cloud calculator to estimate the monetary costs 
of replacing Einstein® Home CPU cycles with equivalent commercial "cloud 
computing" CPU cycles. For example in the last week of October 2010, Ein- 
stein@Home hosts did 35 711 CPU- weeks of computing. The hosts are thus 
the equivalent of about 35k CPU cores operating 24x7. At that time, using 
the Linux/small and Linux/large resources, and leaving out any data transfer 
or storage costs, the estimated cost for the Amazon/US-Standard cloud was 
$2.2M/month and $8.7M/month without monitoring. 



frequency electromagnetic data from the Arecibo Observa- 
tory, looking for the regular patterns of pulsation produced 
by pulsars. The 305-meter Arecibo Telescope is the world's 
largest and most sensitive radio antenna, and has discovered 
a substantial fraction of all known pulsars. Searches in pul- 
sar survey data are also computationally limited, particularly 
when looking for radio pulsars in tight binary systems, where 
the rapidly changing accelerations modulate the pulse peri- 
ods (Lorimer 2008). The massive computing power of Ein- 
stein® Home (one- to two- orders-of-magnitude more than is 
typically available in the radio astronomy community) is used 
to search for neutron stars in short-period orbits around com- 
panion stars. This is an unexplored region of parameter space, 
where other search methods lose much or most of their sensi- 
tivity. 

Searches for binary radio pulsars can be characterized by 
the ratio of phase-coherently analyzed observation time T to 
orbital period Porb of the pulsar. There are three cases. (1) 
For orbital periods long compared to the observation time, i.e. 
T/Porh ^ 0-1. the signal can be well described assuming a 
constant acceleration and 'classical' acceleration searches are 
a computationally efficient analysis method (Ransom et al. 
2002) with only small sensitivity losses. (2) If multiple orbits 
fit into a single observation, i.e. T/Porb ^ 5, then sideband 
searches provide a computational short-cut with a factor 2-3 
loss in sensitivity (Jouteux et al. 2002; Ransom et al. 2003) 
compared with the optimal matched filter coherent search. 
(3) The intermediate range 0.1 < T/Porb ^ 5 is accessible 
with high sensitivity by time-domain re- sampling with a large 
number of orbital parameter combinations (templates). 

The Einstein® Home search is characterized by case (3) 
above; matched filtering is used to convolve observational 
data with large numbers of templates. These methods and 
the construction of optimal template banks have been thor- 
oughly investigated in the context of gravitational wave data 
analysis (Owen 1996; Owen & Sathyaprakash 1999; Abbott 
et al. 2007, 2009a; Abbott et al. 2009) and are adopted here. 
Einstein® Home uses a time-domain re-sampling scheme to 
search for radio pulsars in compact binary orbits (Knispel 
2011). It features a fully-coherent stage, which removes the 
frequency modulation of the pulsar signal from binary mo- 
tion in circular orbits; full details are given in Section 4.9. 
The number of trial waveforms is so large that the required 
computational resources can only be obtained with volunteer 
distributed computing. 

2.5. The Berkeley Open Infrastructure for Network 
Computing 

Like the majority of volunteer computing projects, Ein- 
stein® Home is built on the Berkeley Open Infrastructure for 
Network Computing (BOINC) platform. This tool-kit pro- 
vides a common infrastructure for all the necessary server and 
client-side software, as well as community features like mes- 
sage boards and web pages. In fact Einstein® Home was one 
of the early adopters of BOINC, and Einstein® Home devel- 
opers have made extensive contributions to the BOINC soft- 
ware base, particularly in the scheduling system, which deter- 
mines what work, and how much work, to send to host com- 
puters. 

Volunteer computing differs from traditional "grid comput- 
ing" or the use of dedicated clusters, because resources are un- 
reliable, insecure, and sporadically available, and are donated 
by participants who are anonymous and unaccountable. This 
creates special requirements for infrastructure software; for 
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example, volunteer computing projects must use techniques 
such as redundant computing to verify the correctness of re- 
sults. 

The Berkeley Open Infrastructure for Network Computing 
(BOINC) project was created in 2002 to provide general- 
purpose infrastructure software for volunteer computing. 
BOINC is based at the University of California Berkeley 
Space Sciences Laboratory, and is funded by the National Sci- 
ence Foundation. The software is open source, and released 
under the Lesser GNU Public License (LGPL). The funda- 
mental design principle is that every aspect of the volunteer 
computing system is unreliable (perhaps even maliciously so) 
apart from the central project servers. BOINC provides (1) 
a central service that distributes work, collects results, and 
keeps track of hosts, (2) a client (run on volunteered hosts) 
that communicates, manages computation and storage, and 
shows graphics, and (3) web pages to show account and team 
information to volunteers, and to provide "community ser- 
vices" such as message boards and chat forums. 

For scientists, BOINC makes it possible to create and main- 
tain volunteer computing projects for a wide range of appli- 
cations. Each project runs its own servers, and is completely 
independent of other projects. A single project can support 
multiple programs and executables, doing different scientific 
work. For volunteers, BOINC 's design allows participation 
in multiple projects, and provides individual control over how 
the resources are allocated among them. 

Einstein® Home benefited from adopting the proven 
BOINC infrastructure. To meet some of the special needs 
of Einstein® Home, BOINC was also enhanced and extended 
in a way that made those new features available to the entire 
BOINC conmiunity. 

2.6. BOINC Internals 

A BOINC project like Einstein@ Home has two sides: the 
client side, consisting of the volunteered host computers 
(called "hosts") and the server side, which are the comput- 
ers owned and administered by the project (called "the project 
servers"). The Einstein@Home project servers are geograph- 
ically distributed; some are at the University of Wisconsin - 
Milwaukee (UWM) and some are at the Albert Einstein Insti- 
tute (AEI) in Hannover, Germany. 

BOINC CLIENT SIDE 

The "BOINC Client" is the most important program run- 
ning on the host. This program does not itself do any sci- 
entific computation. Instead, it manages and administers the 
running of application executables supplied by one or more 
projects such as Einstein@Home, which the volunteer has 
signed up for. The BOINC client communicates with the dif- 
ferent project servers by sending them small XML files (called 
"scheduler requests") and receiving small XML files in re- 
sponse (called "scheduler replies"). When it detects that the 
host is idle, it picks a task from one of the projects, down- 
loads any needed input data and executables from the project 
servers, verifies that they have the correct md5 sums and sig- 
natures, and starts to run the task (either from the start, or 
from a previously- saved checkpoint). The BOINC client uses 
scheduling algorithms to determine when to run a particular 
task from a particular project, and when new tasks and/or data 
must be downloaded. It manages the uploading of completed 
work, reports the exit status (and any errors) from the exe- 
cutable, monitors tasks to be sure they are not using too much 



CPU time, memory, or disk space, and signals tasks when it 
is time to checkpoint. 

The executables which the BOINC Client runs on host ma- 
chines are called "applications"; they do the scientific work. 
In the case of Einstein® Home they read data files containing 
instrumental or detector output, search it for candidate sig- 
nals, and write the most significant candidates to a file; a full 
description is given in Section 4.9. 

When instructed by the BOINC Client, applications check- 
point: they save enough information to return to the current 
state in the computation, so that if interrupted the computa- 
tion can be completed without starting from the beginning. 
The Einstein® Home application checkpointing is described 
in Section 4.11. 

BOINC application programs are very similar to con- 
ventional C-language programs; However they are linked 
against a BOINC application library, which handles the 
interaction with the BOINC Client. The library pro- 
vides replacements for standard input and output func- 
tions: for example F I LE ^ f open ( ) is replaced by F I LE 
^boinc_fopen(). These subroutines interact with the 
BOINC Client to ensure that input data are obtained from 
the project server, and output data are properly returned to 
the server. Another important library subroutine is int 
boinc_time_to_checkpoint ( ) . This must be periodi- 
cally called by the application, and returns a non-zero value 
if the application should checkpoint. The routine void 
boinc_f raction_done ( ) must be periodically called by 
the application to report the fraction of work completed; the 
argument is typically the ratio of the outermost loop-counter 
to the total number of iterations. The last essential library 
routine is void boinc_f inish ( ) , which is called by the 
application to report its exit status. This is zero if the applica- 
tion completed correctly, or a non-zero error code if a problem 
was encountered and the application had to exit prematurely. 

BOINC SERVER SIDE 

A BOINC project server consists of server computers, 
which are owned and administered by the project, and 
programs/processes running on that hardware. For Ein- 
stein® Home these are located in four 19-inch equipment 
racks in a computer server room in the UWM Physics De- 
partment; similar components are located in the Atlas Cluster 
room at the AEI. There are also a handful of data download 
mirrors, located at participating academic institutions in the 
USA and Europe. 

The programs/processes running on the Einstein® Home 
project servers are typical of all BOINC projects, and are il- 
lustrated in Figure 1 . Each box denotes an independent com- 
puter program; in the case of Einstein® Home these are run- 
ning on three different computers at two locations. As shown 
in the figure, some of the BOINC components are generic: the 
same for all BOINC projects. Other components are custom- 
made for Einstein® Home. 

The programs are coordinated through a single central 
MySQL database, which runs on a high-end server, and is 
the "heart" of the project. The most important database tables 
are the User Table, which has one row for for each registered 
volunteer, the Host Table, which has one row for each host 
computer that has contacted the Einstein® Home project, the 
Work Table, which has one row for each Workunit (described 
later), and the Result Table, which has one row for each sep- 
arate instance of the Workunit, that is completed, in progress, 
or not yet assigned to a specific host. (For validation pur- 
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Figure 1. A schematic of the most important processes running on the Ein- 
stein® Home project servers, located in the USA and in Germany. They com- 
municate with a single central database, that ensures coordinated operation 
and ties the parts together. The components in grey boxes are generic to 
all BOINC projects and simply taken "unmodified" from the public BOINC 
software package. The components in white boxes have been specifically 
adapted or written for Einstein® Home. The gravitational-wave (GW) and 
7-ray workunit generator, file deleter, validator, assimilator are not listed in- 
dividually but simply labeled "daemons". The arrows pointing externally 
represent network communication with BOINC clients and volunteers. The 
download servers which provide astrophysical data to the BOINC clients are 
not illustrated. 

poses, more than one result is obtained for every workunit, 
so separate tables are used for work and results.) There are 
other tables which are less central and not described here, for 
example the Forum Table contains community message board 
items posted by project staff or volunteers. 

The majority of the other Project Server components are 
long-running background processes. They typically query 
the database for a particular condition, take some action if 
needed, then sleep for some seconds or minutes. For example 
the Validator checks a database flag to see if there is a worku- 
nit with a quorum of completed results. If so, it compares the 
results as described in Section 4.15 to see if they agree. If they 
agree, it labels one of these as the "correct" (canonical) result, 
grants "computing credits" to the volunteers whose hosts did 
the work, and marks the workunit as completed. If the re- 
sults do not agree, it sets a flag in the database, which will 
then be seen by the transitioner, which will in turn generate 
a new result for that same workunit^^. Another example is 
the Workunit Generator, which creates the rows in the work 
table. Each row contains the name and version of the applica- 
tion program to run, the correct command line arguments and 
input file name(s), estimates of the required CPU-time and 
memory size required, and so on. 

An additional set of project server components communi- 
cates with hosts. The File Upload Handler receives com- 
pleted results from the BOINC client, through the normal 
HTTP port 80. This ensures that any host which has web 
access can be used to run Einstein@Home. The Scheduler 
parses the XML scheduler request files from the BOINC 
client. These typically contain requests for new work, or re- 
port completed work that has been uploaded as just described. 
The Scheduler then queries the database to find new work 
suitable for the host, or updates the database to mark that a 
result has been obtained, and sends an XML reply to the host. 
On Einstein@Home, Scheduler requests typically arrive at the 
project server at a rate of several Hz. 

The name "result" can be misleading. When it is first created, the "re- 
sult" has not yet been assigned to a host; it is simply a line in the database 
Result Table, and may be thought of as the potential result of some future 
computation. Only later, after the "result" has been assigned to a host, and 
the host has carried out the computation and returned its output to the server, 
does the "result" actually represent the result of a completed computation. 



BOINC WORKFLOW AND VALIDATION 

As explained above, the fundamental design principle of 
BOINC is that everything is unreliable, even maliciously so, 
with the exception of the Project Servers. Thus, when work 
is sent to hosts, a correct result might be returned, an incor- 
rect result might be returned, a maliciously "falsified" result 
might be returned, or the host machine and its work might 
simply vanish, never again contacting the Project Server. In 
this hostile environment, BOINC achieves reliability through 
replication and validation. 

To implement this, the components shown in Figure 1 op- 
erate as a state machine. Initially a workunit is created (for- 
mally: a row in the Work Table) by the workunit generator. 
The transitioner then creates a quorum of "unsent results". 
These are rows in the Result Table, not yet assigned to hosts. 
During its the first year of operation, Einstein@Home used a 
quorum of three; since then it has used a quorum of two: to be 
recognized as valid, "matching" results must be returned from 
hosts owned by at least two different volunteers. The sched- 
uler receives requests from hosts, and eventually assigns the 
results to suitable host machines owned by different volun- 
teers. The results are then marked with the identity of the host 
and with a deadline that is typically two weeks in the future. 

If the computation for the two results is finished and re- 
turned to the server within the deadline, then they are com- 
pared by the validator (described in more detail in Sec- 
tion 4.15). If they agree, then one of the results is chosen 
as the canonical result, both hosts and volunteers are credited, 
and the workunit is over. If the results do not agree, or if one 
of the results did not run to completion and generated a non- 
zero exit code, or if a result is not returned to the server by the 
deadline, then the transitioner generates another result (again, 
a row in the Result database table) which is subsequently sent 
by the scheduler to yet another host owned by yet another an- 
other user. This process continues, until a quorum of valid 
results is obtained. 

To date, in the Einstein@Home search of the PALFA 
dataset, approximately 65 million workunits have been gen- 
erated and completed. 

3. THE PALFA SURVEY 

The Pulsar ALFA Survey (Cordes et al. 2006) was pro- 
posed and is managed by the PALFA Consortium, consisting 
of about 40 researchers (including students) at about 10 insti- 
tutions around the world. Since 2004, operating at 1 .4 GHz, 
it has been surveying the portion of the sky that is visible to 
Arecibo (zenith angle less than 20°) within ±5 deg from the 
Galactic plane. To carry out a complete survey will require 
about 47k separate pointings of the 7-beam system, or about 
330k separate beams of data. 

Within our Galaxy it is estimated that approximately 20 000 
normal radio pulsars and a similar number of millisecond pul- 
sars (MSP) beam toward Earth. The PALFA survey is the final 
step before a full census of Galactic radio pulsars is obtained 
with next-generation telescopes such as the Square Kilometer 
Array (Cordes et al. 2004). Taking into account achievable 
sensitivities and radio scattering limitations, approximately 
half of these objects are plausibly detectable with SKA. Ap- 
proximately 1 % of these potentially-observable radio pulsars 
are double neutron- star binaries (DNS), and about 2/3 of the 
MSPs are in binaries with white-dwarf companions. About 
1/4 of all of these systems are within the portion of the sky 
covered by the Arecibo beam. The PALFA survey was initi- 
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ated to find these pulsars, and to identify the rare systems that 
give high scientific returns and act as unique physical labora- 
tories. 

Radio pulsars continue to provide unique opportunities for 
testing theories of gravity and probing states of matter other- 
wise inaccessible to experimental science. Of particular inter- 
est are pulsars in short-period orbits with relativistic compan- 
ions, ultrafast MSPs with periods P < 1.5 ms that provide 
important constraints on the nuclear equation of state, MSPs 
with stable spin rates that can be used as detectors of long- 
period (> years) gravitational waves (Kramer et al. 2004), 
and objects with unusual spin properties, such as those show- 
ing discontinuities ("glitches") and apparent precessional mo- 
tions, both 'free" precession in isolated pulsars and geodetic 
precession in binary pulsars. Long period pulsars (> 5 s) 
are of interest for understanding their connection with magne- 
tars. Pulsars with translational speeds (revealed through sub- 
sequent astrometry) in excess of 1000 km s~^ constrain both 
the core-collapse physics of supernovae and the gravitational 
potential of the Milky Way. 

There is also a long-term payoff from the totality of pul- 
sar detections, which can be used to map the electron density 
and its fluctuations, and map the Galactic magnetic field. In 
the same vein, multi- wavelength analyses (including infrared, 
optical and high energy observations) of selected objects pro- 
vide further information on how neutron stars interact with 
the ISM, on supemovae-pulsar statistics, and on the relation- 
ship of radio pulsars to unidentified sources found in surveys 
at other wavelengths. 

IMPORTANCE OF, AND EXPECTED NUMBERS OF, 
PULSARS IN SHORT-ORBITAL-PERIOD BINARIES 

Strong-field tests of gravity using pulsars have a notable 
history. The Hulse-Taylor binary PSR B1913+16, a DNS 
with a 7.75-hr orbital period, loses orbital energy via grav- 
itational radiation precisely as predicted by general relativ- 
ity. Measurements of post-Newtonian orbital effects permit 
the neutron star masses to be measured to high precision, and 
provide high-precision tests of the consistency of general rel- 
ativity. The shorter 2.4 h orbital period of the double pulsar 
J0737— 3039 provides even better tests of general relativity 
(Kramer & Wex 2009). There are strong incentives to search 
for binaries with still shorter orbital periods; such compact 
systems would provide further stringent tests of general rela- 
tivity. But short orbital-period systems containing active radio 
pulsars are rare, so any new discoveries are extremely impor- 
tant. 

It is not difficult to estimate the number of short orbital- 
period DNS in the Galaxy. We only need an estimate for the 
DNS Galactic merger rate, and a formula for the lifetime of a 
DNS system as a function of its orbital period Porb- Current 
estimates (Abadie et al. 2010) for the DNS Galactic merger 
rate arei? ~ ]^q-5±o.8 yj.-i gravitational wave inspiral 
time for a circular system of two 1 .4 solar-mass neutron stars 
starting from orbital period Porb is r = ro(Porb/^o)^^^, with 
To ~ 7.1 Myr and Pq = 1 hr. Thus the expected period for 
the most compact DNS in our Galaxy is determined by Rr = 
1, implying that the shortest-period DNS in our Galaxy should 
have a period P = Po(Pto)~^/^ = 12 min (the above range 
of R values yields shortest expected periods from 6 min to 
24 min). The only assumptions are that the orbital eccentricity 
is small at the shortest expected orbital period, and that most 
DNS systems are bom with orbital periods short enough that 
their inspiral time is much less than the Hubble time, 13 Gyr. 



Both assumptions are reasonable: some discussion of the first 
may be found in Section 4.5. 

To estimate of the number of short orbital-period DNS sys- 
tems one might expect to find in PALFA data, we also need 
to know what fraction of these systems beam towards Earth. 
Equation 15 of Tauris & Manchester (1998) predicts beam- 
ing fractions of 30-40% for pulsars having period less than 
^ 200 ms; 20% seems a reasonable compromise between 
short-period pulsars (which tend to have broader beams) and 
long-period pulsars that have narrower beams. 

To be detectable in PALFA data, the pulsars must not only 
beam towards Earth, they must also lie in the part of the sky 
visible to PALFA. Simulations of the DNS population show 
that these systems are concentrated towards the Galactic plane 
and the Galactic center (Kiel et al. 2010). We estimate that 
^ 25% of the DNS population falls within the sky area cov- 
ered by PALFA. Thus, multiplying the beaming and coverage 
factors, we conclude that ^ 5% of all DNS systems should 
be detectable in PALFA data. This number agrees well with 
a similar estimate for the detectability of DNS in the Parkes 
multi-beam pulsar survey (Oslowski et al. 2011). 

If 5% of DNS systems are detectable in the PALFA survey, 
the merger rate of this subset is 0.05 R; setting 0.05 Rr = 1 
increases the shortest-expected orbital-periods by a factor of 
0.05"^/^ ^ 3.1. Thus there might well be a DNS system vis- 
ible in the PALFA survey with an orbital period of 37 min 
(the range of R values given above yields shortest-expected 
orbital periods ranging from 19 to 74 min). Because the prob- 
ability distribution of intervals between events in a Poisson 
process is exponential, these ranges include 68% of the prob- 
ability; doubling them includes 86.5% of the probability. 

One can derive similar ranges by scaling from the ob- 
served numbers of longer-period systems. We estimate that 
the Galaxy may contain as many as 2 000 DNS binaries, with 
periods < 10 hr, of which ~ 20% would beam toward us^^. 
Using the period/lifetime scaling relationship above (modulo 
assumptions about birth orbital periods, whose probability 
distribution must be convolved with that due to GW emis- 
sion) there should then be about 50 DNS systems with peri- 
ods smaller than the 2.4-hr period of the double pulsar J0737- 
3039, or about ten that beam toward us. These numbers also 
suggest that there will be 1 object beamed toward Earth 
with a 1-hr period or less, consistent with our estimate in the 
previous paragraph. Given the uncertainties, there is a rea- 
sonable chance that such a DNS binary can be found in the 
PALFA survey. 

In addition, some neutron-star/white-dwarf binaries will 
also spiral in from GW emission while the MSP is still ac- 
tive as a radio pulsar (Ergma et al. 1997). Given that these 
systems are far more numerous than DNS binaries, and that 
pulsars in neutron- star/white-dwarf binaries are longer-lived 
millisecond pulsars, there should be a sizable number visible 
in PALFA data with orbital periods less than 1 hr. 

Although the prospects are not encouraging, it would be 
very exciting to discover a radio pulsar in orbit about a black 
hole. This would likely consist of a neutron star with a canon- 
ical magnetic field ~ 10^^ G, because the more massive black 
hole progenitor would have formed earlier. Unfortunately 
the relatively short radio-emitting lifetime of canonical pul- 
sars compared to recycled pulsars, along with the expected 

The formula in the previous paragraph overestimates the number of sys- 
tems with periods of 10 hr, because such systems are formed with eccentric 
not circular orbits, emit gravitational radiation more rapidly, and decay faster. 
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smaller absolute number of neutron- star/black-hole binaries 
compared to DNS binaries, suggests that the number of de- 
tectable objects in the Galaxy is small. 

3.1. Data Acquisition Spectrometers: WAP PS and Mocks 

As briefly described in Section 1, data are taken with the 
Arecibo L-band Feed Array (ALFA): a seven feed-horn, dual- 
polarization, cryogenically-cooled radio camera operating at 
1.4 GHz (Cordes et al. 2006). The polarizations are summed, 
to produce an RF signal centered on ~ 1.4 GHz. Until 
2009, the PALFA survey used correlator systems, the Wide- 
band Arecibo Pulsar Processors (WAPPs) to compute and 
record correlation functions every At = 64 /is. These mix 
a 100 MHz bandwidth to baseband and calculate the autocor- 
relation for 256 lags. The correlation functions are recorded 
to disk as two-byte integers combined with appropriate header 
information in a custom format. The Einstein@Home analy- 
sis used data sets of 2^^ samples, covering 268.435456 s. 

The 64 /is sample interval was chosen because many pul- 
sars have small duty cycles W/P <C 1 (where W is the 
pulse width and P the spin period) yielding ^ P/W harmon- 
ics that can be combined into a test statistic (the harmonic 
sum). The fast sampling retains sensitivity to spin periods as 
short as P ~ 1 ms combined with duty cycles as small as 
W/P ~ 1/16. If it were not for the practical constraints of 
hardware and data volume, even faster sampling would be de- 
sirable. 

The complete set of autocorrelation functions for a single 
268 s pointing is recorded in 12 files, each ^ 2 GB in size. 
Each set of 3 files contains the data for two beams. (The last 
set of 3 files contain one "phantom" beam of zeros, or a copy 
of another beam.) 

Since April 2009, PALFA has used broader-band higher- 
resolution Mock spectrometers that incorporate digital 
polyphase filter banks (Mock 2007). The Mock spectrome- 
ters cover a frequency bandwidth of 300 MHz, from 1.175 to 
1.475 GHz in 1024 channels, with a sample time of 64 /is and 
a dynamic range of 16 bits per sample. The operational plan 
is to cover the entire survey region (330 000 beams) with this 
higher-resolution system. 

The Mock spectrometers acquire data with 16-bit resolu- 
tion, which is more than we need. To reduce the burden of 
transfer and storage, data are rescaled to 4-bit resolution at 
Arecibo Observatory. To help preserve weak pulsar signals 
in Gaussian-like noise, the rescaling-algorithm clips outliers 
(typically arising from RFI). For each 1 -second chunk of data, 
the median /i and RMS a are computed for each channel. The 
data are clipped to the range (/i — 2.5cr, /i + 3.5a), the floor is 
subtracted, then the data are rescaled to 4 bits. The floor sub- 
traction also flattens the 1 -second average bandpass response. 
The offset and scaling factors (per channel, per chunk) are 
saved in the data structure, and could be used to approximate 
the original 16-bit data if desired. 

The WAPP data was originally acquired and stored in 16- 
bit format. In 2011, to reduce the storage volume, it was also 
reduced to 4-bit format. The expected total data volume from 
the complete PALFA survey is expected to be about 700 TB. 

3.2. Historical Data Acquisition and Processing Rates 

In order to understand how Einstein@Home can be used for 
analysis of PALFA data, we need to compare the current and 
historical data acquisition rates to the Einstein® Home data 
processing rate. On average, PALFA has been granted about 



265 h of telescope time per year. About 12% of the time is 
used for follow-up confirmation and initial timing of newly- 
discovered pulsars. Overhead (telescope slewing, calibration) 
consumes another 12%. So about 200 hr of actual survey data 
are obtained each year. 



Calendar 


Center 


Total 


Beams 


Beams 


Year 


Time 


Time 


acquired 


analyzed 


2004 


69 h 


108 h 


15 149 P 




2005 


278 h 


365 h 


25 320 W 




2006 


250 h 


360 h 


28 649 W 




2007 


72 h 


143 h 


11 275 W 




2008 


182 h 


184 h 


6 640W 




2009 


180 h 


186 h 


6 832M 


6130 W 


2010 


249 h 


275 h 


10 066 M 


60 032 W 


2011 


175 h 


434 h 


24 710 M 


7430M 


2012 


83 h 


334 h 


14 126 M 


27 861 M 


TOTAL 


1538h 


2 389h 


15 149 P 










71 844 W 


66 162 W 








55 734 M 


35 291 M 



Table 1 

Annual observation times and data collection volumes for the PALFA blind 

search survey at the Arecibo Telescope, and for Einstein® Home data 
processing. "Center time" denotes observations towards the Galactic center; 
"Total time" is the sum of this and the anti-center pointing time. "W" and 
"M" indicate WAPP or Mock spectrometer data; "P" indicates pre-survey 
WAPP data, not analyzed by Einstein® Home. 

The annual telescope time (Galactic center and total) and 
data collection volumes are shown in Table 1 from the begin- 
ning of the PALFA survey in 2004. The numbers are lower in 
years when there were no (commensural) observations antipo- 
dal to the Galactic center. Painting work in 2007 and platform 
repairs in 2010 also reduced observing time. The fourth col- 
umn lists the number of beams of blind- search survey data 
acquired in that year. If everything works correctly, seven 
beams are acquired in parallel for each telescope pointing. 
Before 2009, only WAPP data were collected; afterwards data 
were collected with the Mock spectrometer. The last column 
shows the number of beams processed by the Einstein@Home 
data analysis pipeline^^. The overall processing speed of the 
Einstein® Home data analysis pipeline is discussed in Sec- 
tion 4.13. As shown in Table 1, as of the end of 2012, af- 
ter nine years of operation, the PALFA survey had acquired 
142 767 beams of blind- search survey data^^ . 

The accounting of beams of WAPP survey data searched 
by Einstein@Home is as follows. The 15 149 beams of 2004 
WAPP data were taken in a pre-survey (pi 944) mode. These 
were not searched by Einstein® Home because they had a 
shorter time-baseline than the p2030 data that followed, and 
the sky pointings were repeated in the p2030 pointings. Of the 
original 71 844 beams of WAPP p2030 data, 995 beams were 
not transferred to AEI, and 70 849 beams were transferred to 
AEL Of these, 2 102 beams were not sent for pre-processing 
because the corresponding data file counts were incorrect; 
68 747 beams were sent to pre-processing. Of these, 1 591 
beams could not be pre-processed because of data corruption 
or scaling or similar issues; 67 156 beams were sent to Ein- 
stein® Home hosts for processing. Of these, 994 beams had 

During much of 201 1 , Einstein® Home was occupied with re-processing 
data from the Parkes Multi-Beam Pulsar (PMPS) survey carried out in 1998- 
1999. Hence the number of PALFA beams processed was small. The results 
of the PMPS search are reported in Knispel et al. (2013). 

This count does not include data collected for confirmation or follow-up 
observations. 
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enough errors during run-time that the corresponding worku- 
nits errored-out or were canceled. Hence 66 162 beams of 
WAPP data were fully-searched by Einstein® Home. 

As of January 1st 2013, Einstein® Home had analyzed a to- 
tal of 101 453 beams (66 162 WAPP and 35 291 Mock); it is 
currently processing about 160 beams of Mock data per day 
(see Section 4.13 for details). Provided that sufficient tele- 
scope time is granted, the survey will continue and will even- 
tually be extended to higher Galactic latitudes. We expect 
this will increase the yield of millisecond pulsars, since they 
are distributed more widely and their detection is inhibited by 
multi-path propagation (interstellar scattering) that is stronger 
at low Galactic latitudes. 

3.3. Data Storage and Movement 

Data are recorded to RAID storage systems at the Arecibo 
Observatory. Disks containing the data are then shipped to 
the Cornell Center for Advanced Computing (CAC), where 
the raw data are archived on RAID storage systems. The 
CAC also hosts web and database servers that allow uploads 
and downloads of raw data and of processed data products 
to and from the different PALFA institutions. For the Ein- 
stein® Home search, the raw data are transmitted over the In- 
ternet using GridFTP from CAC to the Albert Einstein Insti- 
tute (AEI) in Hannover, Germany. There, they are stored on a 
Hierarchical Storage Management (HSM) system, composed 
of a Fiber-Channel disk array backed by a large capacity tape 
robot. 

4. THE EINSTEIN@HOME RADIO PULSAR SEARCH 

The following is a detailed description of how the E@H 
radio pulsar search works. 

4.1. Preparation of the PALFA data 
WAPP DATA 

Before being sent to host machines, data is prepared in a 
series of pre-processing steps. The first step is Fourier trans- 
formation of the autocorrelation functions. This produces 
dynamic power spectra with 256 frequency channels span- 
ning 100 MHz (390 625 Hz per channel). The channeliza- 
tion allows compensation for the dispersive propagation of 
any pulses from celestial sources. 

At AEI, preprocessing is performed separately for each 
group of three files containing the autocorrelation functions 
for two beams. A script preprocess . sh calls the Cor- 
nell/ALFA program alphasplit to split the files into two 
sets of three files, each containing data from a single beam. 
For each beam, the script then calls filterbank from the 
SigProc package (Lorimer 2008). This reads the three files 
containing data for that beam. The output is a small text 
header, and a 4 GB file containing 2^^ time samples of a dy- 
namic power spectra with 256 channels; power is represented 
as a 4-byte float. The header is combined with the data using 
addheader; the resulting files (one per beam) are the input 
to the Einstein® Home Workunit Generator. 

MOCK DATA 

The first step in the preparation of the Mock data combines 
two sub-band files into a single psrfits files covering 300 MHz 
bandwidth with 960 channels. The Mock data used for the 
Einstein® Home pipeline consist of two 4-bit psrfits files for 
each beam. Each file covers a bandwidth of 172.0625 MHz 



Table 2 

Set of DM trial values used in the Einstein® Home search of the PALFA 
WAPP (upper half) and Mock (lower half) data. 



DM range 


ADM 


number of trial values 


(pc cm~^) 


(pc cm~^) 




to 212.4 


0.6 


355 


212.4 to 348.4 


1 


136 


348.4 to 432.4 


2 


42 


432.4 to 1002.4 


6 


95 


to 213.6 


0.1 


2136 


213.6 to 441.6 


0.3 


760 


441.6 to 789.6 


0.5 


696 


789.6 to 1005.6 


1.0 


216 



in 512 channels, one file contains data from a band cen- 
tered on 1450.168 MHz, the other from a band centered on 
1300.168 Mhz. The sub-band files have sizes of ^ 1.2 GB, 
the combined single files have sized of^ 1.9 GB. 

For the resulting file, a radio frequency interference (RFI) 
mask is computed using presto (Ransom et al. 2002, 2003; 
Ransom 2002) software tools. Furthermore, strong periodic 
RFI is identified and added into a beam- specific "zap list". 
The RFI mask is used in the generation of the work units (see 
next Section), while the zap list is sent to the Einstein® Home 
hosts with all work units of a given beam. 

4.2. Workunit Generation 

The workunit generator has been described in connection 
with Fig. 1. It is an "on demand" BOINC server process that 
prepares data files and "processing descriptions" for the com- 
putational work done on Einstein® Home hosts. The workunit 
generator reads as input one data file per beam^^, prepared 
as described in Sec. 4.1. As output it generates data files 
(628 per WAPP beam, 3808 per Mock beam) which are later 
downloaded by Einstein® Home hosts for analysis. Each of 
these files contains one de-dispersed time series, for a dif- 
ferent value of the dispersion measure (DM). The workunit 
generator also creates one row in the database Work Table for 
each beam and for each DM value; these contain information 
such as the conmiand-line arguments for the search applica- 
tion. 

To generate workunits from the WAPP input data files, the 
data for each beam are de-dispersed with 628 different DM 
values, and then down- sampled by a factor of two to 128 /is. 
For the WAPP data, a single de-dispersed time series has 2^^ 
time samples with 32 bits per sample, yielding 8.3 MB per 
time series. 

The discrete DM values are piecewise linear with four dis- 
tinct slopes as shown in Table 2; they range from to a 
maximum of 1002.4 pccm~^. Since there are (mostly inner- 
Galaxy) pulsars with even larger DM values, we may increase 
this maximum in future searches: compact HII regions can 
create significant additional dispersion. The spacing at small 
DM is set by the requirement that the "smearing" over the en- 
tire observed radio bandwidth arising from the discreteness of 
DM is less than one sample time. At larger DMs, the smearing 
over a single frequency channel becomes the dominant effect. 
Also, the increasing electron density along the line of sight 
leads to multi-path scattering and pulse broadening (Lorimer 
& Kramer 2004), which creates an effective time- smearing 
larger than the sampling time. Work by Bhat et al. (2004) de- 
rived a heuristic relationship between this pulse broadening 

For the Mock data, the RFI mask is also read in through auxiUiary files. 
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and DM; the pulse broadening increases slightly faster than 
quadratically with DM. The increasing DM spacing shown 
in Table 2 is obtained by requiring that the time- smearing 
arising from DM discreteness is smaller than the effective 
pulse broadening from single-channel smearing and multi- 
path scattering. Further details may be found in Sections 2.4.2 
and 3.7.2 of Knispel (2011). 

For the generation of workunits from the Mock data, 3808 
different trial DM values up to 1005.6 pccm~^ are used, de- 
termined with the DDplan.py tool from PRESTO and shown 
in Tab. 2. The de-dispersion is done with other tools from 
the same software suite, using the previously mentioned RFI 
masks to replace broad- and narrowband RFI bursts by con- 
stant values. Mock data are not down- sampled, so there are 
2^^ samples per de-dispersed time series. We initially used a 
dynamic range of 8 bits per sample but halved it to 4 bits early 
in 2012 to reduce Internet bandwidth. The de-dispersed time 
series generated from Mock data currently have file sizes of 
2.1MB. 

The workunits cannot all be generated at once: the result- 
ing data files would overflow the Einstein® Home download 
storage servers. It would also create a huge number of rows in 
the Work and Result Tables, overloading the Einstein@Home 
database server. So the Workunit Generator is automatically 
run "on demand" when the amount of unsent work drops be- 
low a low-water mark; it is automatically stopped when the 
amount of work reaches a high-water mark. In this way, 
the project typically maintains a pool of tens of thousands of 
unassigned results. 

To reduce the load on the Einstein® Home database server 
and increase the run-time per host, up to eight de-dispersed 
time series are bundled into a single work unit, as discussed 
in Sec. 4.13. 



4.3. Signal Model and Detection Statistic 

In searching for possible signals hidden in noise, a model 
for the signals is required. Here, we describe the model used 
for the signal from a constant- spin-rate neutron star in a cir- 
cular orbit with a companion star. 

The phase model ^ for the fundamental mode of the signal 
emitted by a uniformly rotating pulsar in a circular orbit of 
radius a can be written in the form 

$ {t- A) = 27r/ (t + sin (Oorb^ + ^)] + ^o, (1) 



where / is the apparent spin frequency of the pulsar^^, t is 
time at the detector, and a sin (i) is the length of the pul- 
sar orbit with inclination angle i projected onto the line of 
sight. The orbital angular velocity l^orb is related to the or- 
bital period Porb via l^orb = 27r/Porb- The angle denotes 
the initial orbital phase and is the initial value of the sig- 
nal phase. A denotes the ensemble of signal phase parameters 
A = {/,asin (i) , ^orb, ^, ^o}. 

The time-domain radio intensity signal is a sum of instru- 
mental and environmental noise J\f{t) and a pulsar signal 



This model accurately describes the rotation phase of the pulsar for 
some minutes, which is sufficient for the detection process. For longer-term 
observations (see Section 6.3) a more complete and accurate phase model is 
required, for example including additional terms to describe a slow secular 
spin-down. With longer observations, parameters such as the frequency / 
can be determined with great precision; by convention it is then defined with 
respect to time at the Solar System barycenter at a particular fiducial time. 



formed from harmonics of this fundamental mode 



5(t;A) =Ar(t) + ^5, (t;A) 



(2) 



n=l 



where the intensities of each harmonic are given by 

Sn {t; A) = 3? [An exp {t; A)]] . (3) 

The An are the complex amplitudes of the different signal har- 
monics; their values are determined by (or define) the profile 
of the observed de-dispersed radio pulse. 

We define a detection statistic Vn for the n'th harmonic 
through correlation of the radio intensity with the n'th nor- 
malized signal template exp [— m$ (t; A)] for the putative 
signal. This detection statistic is optimal in the Neyman- 
Pearson sense: thresholding on it minimizes the false- 
dismissal probability at fixed false-alarm probability (Allen 
et al. 2002). It can also be obtained by maximizing a signal-to- 
noise ratio (SNR), under the assumption that the initial phase 
^0 is unknown and has a uniform probability distribution. 

In a search for pulsars, the parameters A are not known, and 
so that precise point in parameter space might not be searched. 
However the signal will still appear at a nearby point A', for 
which 



^n(A,AO 



1 

- / 5 {t; A) exp [-in^ (t; A')] 
Jo 



(4) 



Note that Vn is independent of and because of the max- 
imization described above. Therefore from here onwards we 
use A = {/, a sin (i) , ^lorb, V^} to denote a point in the four- 
dimensional search parameter- space. 

If there is no pulsar signal, or it is very weak, the expected 
value of this detection statistic is proportional to the power 
spectrum of the instrumental noise in the neighborhood of fre- 
quency nf. On the other hand, if the pulsar signal is strong 
(in comparison with the noise, so J\f{t) can be neglected), then 
the expected value is 



(P„(A,A')) 



(5) 





2 


2 





^£dt exp[m($(t;A)-$(t;A'))] 

This assumes that the observation time T is much longer than 
the pulsar period: fT >> 1. 

If the instrumental/environmental noise J\f is Gaussian 
then the detection statistic Vn is described by a non-central 
distribution with 2 degrees of freedom, one coming from 
each of the real and imaginary parts of the integrand in Equa- 
tion (4). The strength of the pulsar signal determines the non- 
centrality parameter: in the absence of a pulsar signal the non- 
centrality parameter is zero. 

The detection statistics Vn for different values of n may be 
combined to form other detection statistics. If the pulse pro- 
file were known in advance, a particular weighted sum would 
be optimal. Since in practice for blind searches this is not 
the case, we need to make some arbitrary choices about what 

For some beams, the noise J\f contains strong RFI and is non-Gaussian. 
However there are many clean beams where this is not the case. For con- 
taminated beams, the event selection procedures described in Section 4.10 
also has a mitigating effect. In any case, using lower thresholds based on the 
assumption of Gaussian noise is justified: RFI does not weaken real pulsar 
signals but instead creates stronger false alarms. 
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statistics to construct, and how many such statistics to con- 
struct. 

To design statistics, we simply assume that radio pulsars 
have profiles that resemble a Dirac delta- function, truncated to 
some finite number of harmonics. A delta-function has equal 
weights in all the amplitudes (\An\ independent of n) so we 
have chosen to use statistics that equally weight the Vn up to 
some maximum harmonic. This choice also makes it simple 
to characterize the false alarm probability associated with the 
resulting statistic. 

Thus we define five detection statistics Sq, . . . , S4hy inco- 
herently summing the values of Vn 

SL = J2T^n- (6) 

n=l 

The statistic is proportional to the power in the funda- 
mental harmonic of the pulsar rotation period; the statistic 6*4 
equally weights the power in the first 16 harmonics. In the 
noise-only case the probability distribution of S*!, is 

p{SL)dSL=xlNi^SL)d{2SL)^ (7) 

which is a chi-square distribution with 2N = 2^+^ degrees 
of freedom. 

The false-alarm probability pfa is the probability that Sl 
exceeds some threshold value in the absence of a signal. 
This is given by the area under the tail of the probability dis- 
tribution ppA = Q2N where 

Q2N (x) = T{x;2N) = £ dy y^^-^e"^ (8) 

is the complement of the cumulative distribution function: 
the incomplete upper Gamma function. This may be easily 
computed by means of analytical or numerical approxima- 
tions. 

The detection statistic is unlikely to assume large values 
in random Gaussian noise; large values are indications that a 
pulsar signal may be present (or that RFI is providing a sig- 
nificant background of non-Gaussian noise). We define the 
significance of such a candidate as 

5(5L) = -iogio(m)- (9) 

A candidate with significance of (say) 30 has a probability of 
10~^^ of appearing in Gaussian random noise. 

4.4. Template banks 

In a search for unknown new pulsars, as explained be- 
fore Equation (4), one evaluates the detection statistics 
S'l(A') at many points in the parameter space A = 
{/, a sin (i) , Qorb, '0}- In order to enhance the statistical like- 
lihood of detection (to maximize the SNR) one would like to 
evaluate this quantity at precisely the correct point in param- 
eter space A' = A where the pulsar is located. But this is 
impossible, since the pulsar parameters A are not known be- 
fore discovery! 

In a practical search, S'l(A') is calculated for many dif- 
ferent values of A^ These "trial values" of the unknown 
pulsar parameters must be spaced "closely enough" that not 
too much SNR is lost from the mismatch between A and A^ 
However if they are spaced too closely, precious computer cy- 
cles are wasted, because S'l(A) and S'l(A') are correlated if 
AA = A — AMs small. 



The set of points in the parameter space A where the detec- 
tion statistic is evaluated is called a template grid or template 
bank. An optimal grid will maximize the probability of de- 
tection at fixed computing cost; in general it will not be a 
simple regular Cartesian lattice with uniform spacings along 
each axis. Within the gravitational- wave detection commu- 
nity, substantial research work has shown how to construct 
optimal or near-optimal template grids (Owen 1996; Owen & 
Sathyaprakash 1999; Harry et al. 2009; Messenger et al. 2009; 
Fehrmann & Pletsch 2013); we make use of those ideas and 
methods here. 

The most important tool for setting up a template bank is the 
metric (Owen 1996) on the search parameter space. To sim- 
plify matters, consider only the detection statistic 5*0 = Vq for 
the fundamental harmonic of the pulsar. The metric measures 
the loss of the expected strong-signal detection statistic which 
arises if the parameters of the search point A' are mismatched 
from those of the putative signal A. It follows immediately 
from Equation (5) that this loss is described by a quadratic 
form in AA, since the second modulus-squared term on the 
r.h.s. is maximized (at unity) if the signal and search param- 
eters match exactly (AA = 0). Thus the fractional loss of 
detection statistic (called the mismatch m) must be quadratic 
in AA as one moves away from this maximum: 

m (A, AO = 1 - = ^a^AA-AA^ + 0{^K% 

Here the indices a and h label the four parameter- space co- 
ordinates /, a sin (z), ^lorb, and ^, and we adopt the Einstein 
summation convention where repeated indices (in this case a 
and h) are summed. We assume the strong signal limit, so 
{Vo) is defined as in Equation (5). 

It is straightforward to show that Qah is a positive- 
definite symmetric quadratic form: a metric of signature 
(+,+,+,+). The components of the metric can be computed 
directly from the phase model Equation (1). A short calcula- 
tion yields 

where the^ angle brackets denote a time-average 
{Q)t = ^ Jq Q {t) dt and da denotes the partial derivative 
with respect to the a'th component of A. 

If the mismatch is small (positive, but much less than unity) 
then the surface of constant mismatch is a ellipsoid in parame- 
ter space. The problem of efficient template bank construction 
is to cover the desired part of parameter space with the small- 
est possible number of these ellipsoids for a given nominal 
mismatch ruQ. For a general (non-constant, as here) metric 
this template bank is not regular or uniform. 

The quadratic approximation in Equation (10) is inaccurate 
for typical Einstein@Home mismatches (mo = 0.2 or 0.3). 
For these values, the region of parameter- space covered by 
a template is banana-shaped rather than ellipsoidal; see Fig- 
ure 3. Thus, in creating template banks, mismatches are com- 
puted using the exact definition Equation (10) rather than the 
metric approximation. Nevertheless, the metric is still useful, 
as described below. 

4.5. Parameter space searched by Einstein@ Home 

In order to carry out a search the parameter space must be 
covered with a suitable template bank. Thus, one must decide 
what region of parameter space to cover: what range of pul- 
sar spin frequencies, orbital periods, etc. should be searched? 
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With unlimited computing resources, one could search the en- 
tire physical parameter space. In practice, Einstein@Home 
has finite computing power, so we can only search some part 
of parameter space. Just as an intelligent gambler needs to 
decide whether to play blackjack or poker, we need to decide 
where (in parameter space) to invest our precious compute cy- 
cles. What parameter-space regions are most likely to yield a 
scientific pay-off? 

The region to search is astrophysically motivated and tar- 
gets the Einstein® Home search to the most likely range of 
putative pulsar orbital parameters and spin frequencies. We 
constrain the search parameter space by setting a probabilis- 
tic limit on projected orbital radii, and by an upper limit on 
spin frequencies. 

As described in Section 2.4, standard acceleration searches 
lose sensitivity where Porb ^ lOT. For the PALFA data, this 
is Porb ^ 45min. Since other search pipelines within the 
PALFA collaboration use standard accelerations searches, the 
Einstein@Home search was set up to complement these ef- 
forts. Thus, the longest orbital period in the Einstein® Home 
search is chosen to be 45 min (plus one template for an iso- 
lated system). 

The lower limit on Porb is determined by the available com- 
puting power: as we show below, the computing cost grows 
rapidly as the minimum orbital period decreases. We choose 
Porb ^11 min, significantly increasing sensitivity to pulsars 
in compact binary systems. 

Even for these short orbital periods, for the purposes of de- 
tection, we can neglect relativistic corrections 0{{v/cf') to 
the phase model (1), because they correspond to less than a 
single cycle of phase error. In the worst case, the value of 
{v/cf ^ Ax 10"^ for Porb = 660 s and asin(i) = 0.21t-s. 
Thus, the additional phase accumulated over T = 268 s for a 
signal at / = 400 Hz is ^ fT{v/cf 0.4 < 1 cycles. 
This corresponds to an acceptable worst-case 19% loss in de- 
tection statistic. 

Our search, described by the phase model Equation (1), as- 
sumes circular orbits. There are two motivations. First, ex- 
trapolation from known pulsars in binaries suggests that by 
the time they decay to the short periods that are the new fea- 
ture of the Einstein® Home search, their orbits will be circu- 
larized by the emission of gravitational radiation. Second, it 
was easy to implement by using well-explored methods from 
gravitational- wave data analysis. (In fact as described in Sec- 
tion 4.8 the search is still sensitive to pulsars in orbits with 
eccentricities e < 0.1.) 

We now review the expectations regarding orbital eccen- 
tricity e. The majority of known pulsar/white-dwarf bina- 
ries have very small orbital eccentricities (e < few x 10~^) 
(Lorimer 2008). Thus, they are well described by the cir- 
cular phase model Equation (1). Known double-neutron star 
(DNS) system typically have larger orbital eccentricities, but 
their orbital periods are much longer than the target val- 
ues for Einstein® Home. These systems will evolve by the 
emission of gravitational waves, which over time circular- 
izes the orbits (Peters 1964). If the known DNS systems 
from Lorimer (2008) are evolved until their orbital periods 
drop to 11 min, they are well described by a circular phase 
model: the evolved eccentricities en at an 11 min orbital 
period are very small compared to the present-day values. 
This is not surprising: binaries formed with short periods 
and large en would decay rapidly through emission of grav- 
itational radiation. With the exception of PSR B 19 13+ 16 
(en = 0.0302) and PSR B2127+11C (en = 0.0416), we 



find that en < 0.005 for all known DNS systems. Highly 
evolved pulsars in such systems are therefore detectable by 
the Einstein® Home search as show in Sec. 4.8. 

Mass transfer in X-ray binaries also circularizes the orbits 
of radio pulsars in compact binaries. As Archibald et al. 
(2009) have shown. X-ray binaries can become visible as bi- 
nary radio pulsars after the accretion stops and radio waves 
from the pulsar can escape the system and reach Earth. The 
orbits of these systems are quickly circularized during the 
phase of mass transfer (Stairs 2004). For example, the X-ray 
binary with the shortest known orbital period (about 1 1 min- 
utes) is X1820 - 303 (Smale et al. 1987). If the mass transfer 
stopped and a radio pulsar emerged, it would have an almost 
perfectly circular 11 min orbit. Such objects would probably 
not be found by an acceleration search, but might be detected 
by Einstein® Home. 

The constraints on the projected orbital radius a sin(i) are 
determined by the expected ranges of pulsar and companion 
masses (Messenger et al. 2013). We allow the maximum al- 
lowed value of asin(i) to depend on pulsar and companion 
masses and on the orbital period. From Kepler's laws we find 

_ 2 

a Sin(i) < aF (mc,max, ^p,min) ^^orb ' 

where mc,max is the maximum companion mass and mp^min is 
the minimum pulsar mass. The function 

F (rncmax, mp,nun) = " (12) 

is a mass-dependent scaling factor, where G is the gravita- 
tional constant. The parameter < a < 1 bounds the or- 
bital inclination angles: for given masses mp^min and m^max. 
and given a, this condition defines an upper limit on the pro- 
jected orbital radii as a function of the orbital angular veloc- 
ity. For the Einstein® Home search we selected a = 0.5, 
^p,min = 1.2M0 and mc,max = 1.6M0. 

We can use Equation (12) to calculate the fraction p of 
the total possible solid angle 47r steradians in which the nor- 
mal vector to the orbital plane may lie. The distribution of 
possible orbital inclination angles is uniform in cos (i) and 
thus the fraction p of systems with inclinatio n angles be- 
tween and i is p = 1 — cos {%) = 1 — ^1 — sin(z)^. 
For arbitrary pulsar (mp) and companion (rric) masses, we 
may write the orbital radius as a = F {rric^m^) Q.^^^ . In- 
serting this in the l.h.s. of Equation (12) yields sin(z) < 
aF (mc,max, '^p,min) / F {uic: TTip). From this, the fraction p 
follows 

-1 /-I 9 (^c,max5 ^p,min) \ 

y V F(mc,mp) J 

This quantity, the fraction of orbital inclination vectors cov- 
ered by the Einstein® Home search parameter space, is shown 
in Figure 2. 

The Einstein® Home search parameter space is also con- 
strained in maximum spin frequency / < /max- As explained 
in Section 4.6, the number of orbital templates grows with 
/max- So one must strike a compromise, choosing a frequency 
for which Einstein® Home can detect a large fraction of mil- 
lisecond (and slower) pulsars, while not exceeding the avail- 
able computing power. The search grid is designed to recover 
frequency components up to /max = 400 Hz^^. 

For a pulsar spinning at 100 Hz, this would only recover the power up 
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Figure 2. The fraction p of total solid angle covered by the Einstein® Home 
search parameter space, Equation (13). The horizontal axis shows the pulsar 
mass and the vertical axis the companion mass. All systems in the white 
region are detectable for any inclination angle, elsewhere only a fraction p 
has favorable orbital inclinations. 
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Figure 3. The white region is the "design wedge" of orbital parameter space 
searched by Einstein@Home, as described in Sec. 4.5. The vertical axis is 
the projected orbital radius, the horizontal axis is the orbital angular velocity, 
and the initial orbital phase ip G [0, 27r) dimension is suppressed. The dots 
are the orbital template locations, constructed as described in Sec. 4.6. For 
a few orbital templates (located at the black crosses) the region of mismatch 
m < 0.3 at fixed -0 for / = /max is shown in gray. As discussed in Sec. 4.4 
the template coverage regions are banana-shaped, not ellipsoidal. 

The constraints above define a wedge of orbital parameter 
space, shown in Figure 3. 

The shorter PALFA data sets spanning T = 134 s have dif- 
ferent parameter space constraints. The orbital period range 
was halved, to 5.5 min < Porb < 22.5 min, which also sped 
up the overall data analysis. We re-invested this gain into 
searching for higher spin frequencies / < 660 Hz. The con- 
straint on the projected radius was left as in Equation (12). 

In the part of the PALFA survey using the Mock spectrom- 
eters, there also are some observations covering T = 536 s. 
For the Einstein@Home pipeline we only used the first half 
of these observations. 

4.6. Template bank construction for Einstein@Home 

For the Einstein@Home search, we have chosen to con- 
struct a template bank which is completely regular and uni- 
form in the frequency dimension. Thus, our template bank 
is the direct Cartesian product of a uniformly- spaced grid in 
frequency with a three-dimensional orbital template bank in 

to the fourth harmonic. 



the remaining parameters Aorb = {a sin (z) , l^orb, V^}- Hav- 
ing uniform frequency spacing simplifies matters and allows 
the use of Fast Fourier Transforms (FFTs) in the frequency- 
domain; FFTs are computationally very efficient if the fre- 
quency points are uniformly-spaced. 

In this paper template bank refers to the four-dimensional 
grid, and orbital template bank to the three-dimensional grid. 
To construct the orbital template bank, a three-dimensional 
"orbital" metric is obtained by projecting the metric Qah onto 
the sub- space / =constant. A detailed calculation of the four- 
dimensional metric Qah and the three-dimensional projected 
metric may be found in Knispel (201 1). 

If a metric is constant or approximately constant, then 
lattice-based methods (Owen & Sathyaprakash 1999) can be 
employed to generate templates covering the parameter space. 
However the metric here is not even approximately constant, 
and alternative methods are needed. Two simple and efficient 
methods are random template banks (Messenger et al. 2009), 
and stochastic template banks (Harry et al. 2009). 

For a random template bank, template locations are chosen 
at random with a coordinate density proportional to the vol- 
ume element: the square-root of the determinant of the met- 
ric. The expected number of templates can be calculated from 
the proper volume of the search parameter space and the cho- 
sen coverage r] and nominal mismatch mo (Messenger et al. 
2009). 

Stochastic template banks are formed in the same way, but 
then in a second step, superfluous templates (those closer than 
the nominal mismatch) are removed. 

For both random and stochastic template banks, the goal is 
to cover most, but not all, of the parameter space; the cover- 
age 77 < 1.0 describes the fraction of parameter space which 
lies within the nominal mismatch of one of the template grid 
points. 

As described, Einstein@Home template banks are a Carte- 
sian product of a one-dimensional uniform frequency grid 
with a three-dimensional orbital template bank. This affects 
the construction of the orbital bank in three important ways. 

First, the orbital template bank must be created for the high- 
est frequency used in the search. This is because the same 
orbital bank is used at all frequencies. Thus its spacing (mis- 
match) must be the finest needed at any frequency. The spac- 
ing is finest at the highest frequency /max, because the ex- 
pected detection statistic Equation (5) depends upon the dif- 
ference in phase, which varies most rapidly at the highest fre- 
quency. The total number of orbital templates required at a 
given mismatch and coverage grows like /^^^ because the grid 
coordinate spacings are proportional to l//max in each of the 
three dimensions. 

Second, this affects how mismatches are computed between 
two orbital templates, in creating a stochastic bank, as illus- 
trated in Figure 4. Because the orbital templates are repro- 
duced at every frequency bin, a given orbital template covers 
a larger region of the orbital parameter space than that defined 
by its overlap with the surface / = /max- The orbital and fre- 
quency parameters are degenerate: one can recover most of 
the detection statistic at the incorrect orbital parameter value, 
provided that the frequency value is also mismatched. If the 
frequency and orbital parameters are denoted A = {/, Aorb}, 
then the mismatch between two orbital templates is 

m(Aorb,A^,b) = niinm({/max, Aorb},{/, A^rb})- (14) 
In practice, the minimum does not occur for f widely sepa- 
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Figure 4. A schematic of template coverage in the parameter space with 
coordinates A. The vertical axis is frequency /, the horizontal axis denotes 
the orbital parameters Aobs. and the horizontal lines denote frequency bins, 
separated by A/ ^ 1 mHz and extending up to /max = 400 Hz. The dark 
dots show template locations; the ellipse denotes the coverage region (mis- 
match m = 0.2) of one orbital template. Because the four-dimensional grid 
is a Cartesian product, the orbital template is reproduced at each frequency 
bin, separated by A/. At fixed frequency a single template covers a small 
region R of orbital parameters. However the "cookie cutter" copies of the 
templates cover a much larger region Rmax of orbital parameter space, ob- 
tained by minimizing the mismatch over frequency and orbital parameters. 
The four-dimensional mismatch m = 0.3 is allowed to be somewhat larger, 
hence i^max includes a small amount of parameter-space outside the orbital 
templates. This illustration is only schematic because the coverage region of 
a template is not elliptical in shape (see Figure 3) and can extend over more 
than a hundred frequency bins. 



rated from /max, so one does not need to search a very large 
range. Typically for /max = 400 Hz the range needed is less 
than ±150 mHz. 

Third, the mismatch in the four-dimensional parameter 
space may be larger than that in the three-dimensional space; 
in this work the corresponding values are 0.3 and 0.2. 

As previously described, Einstein@Home uses five distinct 
detection statistics ^o, . . . , 6*4, which weight contributions up 
to the sixteenth harmonic of the pulsar spin frequency. How- 
ever we use the same template bank for all of these. The 
template banks are designed using only the detection statis- 
tic = Sq. Since that statistic only measures the power in 
the fundamental mode of the pulse profile, it corresponds to 
building a search optimized for sinusoidal pulse profiles. Thus 
in constructing and testing template banks, we only use noise- 
free simulated pulsar signals whose intensity profile varies si- 
nusoidally at the spin frequency. 

Because it was quick and easy, Einstein@Home initially 
used a random orbital template bank with 22 161 templates. 
However after approximately ten months of operation, this 
was replaced by a stochastic orbital template bank contain- 
ing 6 661 templates. This required an investment of computer 
time and human effort, but was justified because the orbital 
template bank is used in the analysis of every de-dispersed 
time series. 

4.7. Parallel construction of stochastic template banks 

It required about 200 kh of dedicated computer cluster time 
to produce a stochastic bank which was about half the size of 
the initial random template bank. This reduced the total Ein- 
stein® Home computing time by a factor of two, saving hun- 



dreds of millions of CPU hours. The parallelized construction 
algorithm for metric-assisted^^ stochastic template placement 
is described in full detail in Sec. 3.5 of Knispel (2011); we 
summarize it here. 

Begin by fixing the desired mismatch mo (here mo = 0.2). 
To describe the algorithm, it is useful to define operations on 
template banks. As before, a template bank (denoted A or B) 
is a set of distinct points in parameter space (denoted a orb). 
A template bank A is called non- overlapping if for all distinct 
points a, a' G A one has m(a, a') > mo. 

The algorithm works by combining pairs of template banks 
to produce new ones. For the description, it is helpful 
to define a merge and prune operation which we denote 
P. This operation takes as arguments (or inputs) two non- 
overlapping template banks, and returns (or produces) a single 
non-overlapping template bank: 

P{A, B)=AU{beB\ m(a, b) > mo for all a e A} . 

(15) 

It is easy to see that if A and B are both non-overlapping, then 
P{A^B) is also non-overlapping. It is also easy to parallelize 
into independent parts. 

The algorithm begins with 2^ non-overlapping template 
banks, and proceeds through p reduction steps, each of which 
halves the number of template banks. Each step takes as its 
input 2^ non-overlapping template banks, and produces as its 
output a set of 2^~^ non-overlapping template banks. To carry 
out a reduction step, the template banks are grouped into 2^~^ 
pairs A, B, and each pair is replaced by P(A, B). These have 
increasingly higher coverage at fixed nominal mismatch. This 
procedure continues until a single bank remains, which is the 
final output of the procedure. 

The algorithm can be trivially parallelized, because a single 
merge and prune operation can be trivially parallelized. If the 
non-overlapping template bank B is partitioned into n disjoint 
pieces B = |jr=i ^^^^ 

n 

P{A,B) = [j^P{A,B,). (16) 

This also holds if the partition is not disjoint, but is computa- 
tionally less efficient. 

In practice, the template bank B is partitioned into roughly 
equal- sized pieces so that the merge and prune operations take 
similar time. The number n of partition elements is selected 
so that the compute time required by the merge and prune 
operations (proportional to the product of the number of tem- 
plates in each argument: | A| |) is independent of the reduc- 
tion level. 

For the Einstein@Home search, we construct a template 
bank using (9(1000) CPU-cores of the Atlas computer cluster 
(Aulbert & Fehrmann 2009). The number of partitions n is 
chosen so that the merge and prune operations P(A, Bi) take 
about one hour. 

The initial input is 2^ = 1024 non-overlapping template 
banks. These are produced as random template banks, each 
containing M = 100 templates, corresponding to 7^ ^ 0.01 at 
mismatch mo = 0.2. Then all M(M — l)/2 inter- tempi ate 
mismatches (14) are computed and templates closer than mis- 
match mo are removed. 

The (square root of the determinant of the) metric is used to determine 
the coordinate-density of grid points in a random bank. However in comput- 
ing mismatches the full detection statistic (rather than the quadratic approxi- 
mation in Equation (10)) is used. 
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Figure 5. Test of the Einstein® Home template bank for pulsars in circular 
orbits with a T = 268 s data span. The bars show a histogram of the mis- 
match distribution for 20 000 noise-free signals from simulated pulsars in 
random circular orbits. The curve shows the cumulative distribution function 
(CDF) of the mismatch. The median mo. 5 and the 90%-quantile of the mis- 
match distribution mo. 9 have been highlighted. The template bank covers 
90% of the parameter space with mismatch m < 0.3. 

We compute the coverage of the final template bank with 
Monte-Carlo simulation (or integration). We begin with a 
large number of simulated noise-free signals at random points 
A in the parameter space. As discussed earlier, these have 
pulse profiles containing only the fundamental mode Ai = 
for i > 1: the detection statistic is Vo = Sq. For each sig- 
nal, the mismatch m is computed for all templates, and the 
minimum is recorded. The coverage rj is the fraction of simu- 
lated signals with mismatches m < tuq. 

The coverage can also be monitored in the prune and 
merge operations: when 99% coverage has been achieved, 
\P{A^B)\ contains 1% of the points from \B\. If sufficient 
coverage has been achieved the reduction procedure can be 
terminated "early" (before the reduction index j = 0). In 
this case, one of the 2^ remaining template banks is arbitrar- 
ily chosen as the output. 

4.8. Template bank verification 

We constructed a template bank with r] = 90% coverage 
and nominal mismatch mo = 0.3 as described above. For 
the PALFA data spanning T = 268 s it covers the region of 
parameter space described in Sec. 4.5 with 6 661 orbital tem- 
plates. For data spanning T = 134 s, the bank (which now 
goes to shorter orbital periods and higher frequencies) con- 
tains 7 113 orbital templates. In both cases a single template 
with a sin(z) =0 was added by hand to facilitate the detection 
of isolated pulsars by the Einstein@Home pipeline. The ob- 
tained stochastic orbital template bank is shown in Figure 3. 

Monte-Carlo integration (as described in the previous sec- 
tion) was used to verify that the template banks have the spec- 
ified coverage and nominal mismatch. This was done using 
20 000 noise-free signals at / = /max with random orbital 
parameters and a sinusoidal pulse profile as previously dis- 
cussed. The resulting mismatch distribution (minimum over 
all templates) is shown in Figure 5. It demonstrates that the 
template bank has the desired coverage r] = 0.9 at nominal 
mismatch mo = 0.3. We note that the median mismatch 
mo. 5 = 0.17 is significantly smaller than the nominal mis- 
match mo. 

We used the same method to test if pulsars in elliptical or- 
bits could be detected by the Einstein® Home pipeline. These 
signals lie outside our parameter space, which includes only 



circular orbits. Thus it was unclear how well pulsars in ec- 
centric orbits could be detected by Einstein® Home. We 
again created 20 000 simulated signals at / = /max with 
random orbital parameters, but with non-zero orbital eccen- 
tricity e. Separate tests were done, with eccentricities e = 
10-"^, 10-^ 10-^ 0.025, 0.05, and 0.1. The longitude of the 
periastron was fixed at = in all runs^^. As before, the 
mismatch was minimized over all templates in the bank. 

For e < 0.025, there was no significant change in the mis- 
match distribution: the median and the 90%-quantile were 
similar to those obtained for circular orbits. Thus the Ein- 
stein® Home search can detect pulsars in orbits with e < 

0. 025 without significant sensitivity losses. 

For e = 0.05 and e = 0.1, as shown in Figure 6, the simula- 
tions show clear deviations from the mismatch distribution for 
circular orbits. The distribution shifts to higher mismatches, 
reaching e.g. mo. 9 = 0.48 for e = 0.1. In this case, for 
10% of the target signals, about half of the detection statis- 
tic (squared SNR) is lost: detection is still possible, but the 
search is less sensitive. 

4.9. Client search code 

The client search code is the part of the Einstein® Home 
radio pulsar search pipeline which runs on the volunteers' 
hosts and does the bulk of the computing work. Its input is 
de-dispersed time-series radio intensity data as described in 
Section 4.2. The client search code computes the detection 
statistics 6*0 , . . . , 5*4 at each template grid point in parameter 
space, and then returns back to the Einstein® Home server a 
list of "top candidates": the points in parameter space where 
the detection statistic was largest. 

The client search code is distributed under the GPL 2.0 li- 
cense and is publicly available from Einstein @Home^^, as are 
binary executables optimized for the complete range of sup- 
ported operating systems and CPU and GPU types. Further 
details of these optimizations are given in Sections 4.12 and 
4.14. 

Below, we give a detailed description of how the client 
search code operates. It carries out five main steps: 

1. The time-series data are uncompressed and type-converted. 

II. The data are shifted into the frequency domain and 
whitened, frequency bins affected by Radio Frequency Inter- 
ference (RFI) are "zapped", and the data are shifted back into 
the time domain. 

III. For each orbital template, this new time series is re- 
sampled in the time-domain to remove the effects of the or- 
bital motion. 

IV. The detection statistics ^o , . . . , 6*4 are computed in the 
frequency-domain using an EFT, searched over frequency for 
the largest values, and five lists of top candidates are main- 
tained. 

V. When the iteration over orbital templates is complete, the 
lists of top candidates are merged and the most significant 
candidates are returned to the Einstein® Home server. 
These steps are schematically illustrated in Figure 7 and de- 
scribed in detail below. 

(I) DATA UNCOMPRESSION /TYPE CONVERSION 



Allowing uu to vary would not change the results much: even at the 
largest eccentricity e = 0.1 the elliptical-orbit phase models have properties 
similar to the = ones. 

http : / /einstein . phys . uwm .edu /license. php 
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Figure 6. Test of the Einstein@Home template bank for simulated pulsars in eccentric orbits. The left panel shows the results for e = 0.05, and the right panel 
those for e = 0.1, respectively. The bars show histograms of the mismatch distribution obtained from 20 000 simulated noise-free signals. The curve shows the 
CDF of the mismatch. The median mo. 5 and the 90%-quantile of the mismatch distribution mo. 9 are highlighted. For eccentricities of 0.001 and 0.025, there is 
no significant loss of sensitivity compared with the circular orbit tests. 
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Figure?. Data analysis on the Einstein® Home hosts, as described in 
Sec. 4.9. The client search code (rectangular box) receives a de-dispersed 
time series as input from the Einstein@ Home download server. The data 
are searched with a large number of orbital models, then a list of the most 
statistically-significant candidates is returned to the Einstein® Home upload 
server. 



The uncompression and type conversion is done imme- 
diately after the cHent search code receives its input: one 
of the 628 WAPP (3808 Mock) different de-dispersed time- 
series data sets described in Section 4.2. In producing 
these, the original 16-bit or 4-bit instrumental data are con- 
verted to floating-point format for de-dispersion on the Ein- 
stein@Home server. To reduce the network bandwidth re- 
quired to transmit it to the host, the time- series is down- 
sampled to 4-bits (and if a significant compression factor can 
be achieved, compressed with gzip)- The first action of the 
client search code is to uncompress the data if required, and 
then convert it back into IEEE-754 single-precision floating 
point representation. A factor from the data file header is used 
to set the overall scale. This is only needed to avoid dynamic- 
range problems, and is irrelevant in what follows. 

(U) WHITENING / RFI ZAPPING 

The next stage of client processing is to whiten the time 
domain data. Whitening is necessary because instrumental 
noise and RFI can result in a very colored data spectrum. If 
the detection statistic were computed from this colored data, it 
would be impossible to compare the statistical significance S 
for templates at different frequencies. In addition, the detec- 
tion statistics Si^ . . . ^84 would be dominated by the "nois- 
iest" frequency band which appeared in the harmonic sum, 
and their statistical distribution would no longer be described 
by the distribution of Equation (7), which would make it 
impossible to compare the statistical significance of different 
candidates. 

To whiten the time-domain data (time-span T) they are first 
padded with 2T of zeros to produce a time-series of length 
3T. The data (which have had the mean removed) are then 
converted into the frequency domain using an FFT. The indi- 
vidual frequency bins have a frequency width A/ ^ 1 mHz; 
their contents are complex Fourier amplitudes. The modulus- 
squared of each amplitude (a periodigram) is computed bin 
by bin, then replaced with a running median value Ai us- 
ing a sliding boxcar window of width ±500 bins (covering 
±0.62 Hz). Finally, the data are whiten ed by multiplying the 
amplitude in each bin by ^/\n (2) /M. (For Gaussian data 
this normalization yields real and imaginary parts that are 
zero-mean unit- variance Gaussians.) The first and last 500 
bins are not whitened and are excluded from further analysis. 

We use the term "zapping" to describe the process of re- 
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Figure 8. The total frequency bandwidth zapped by the fixed zap list used 
in the Einstein® Home search as a function of the frequency /. Below / = 
100 Hz about 1 Hz (one per cent of the data) is zapped. 



placing data in frequency bins that are contaminated with RFI 
with random Gaussian noise. Zapping is needed because RFI 
introduces regular (periodic) variations into the radio inten- 
sity that can mimic pulsar signals and would dominate the 
candidate lists if not removed. In Section 4.10 we describe 
the "toplist clustering" technique that is used to create a list 
of top candidates. For most beams, these candidates are not 
dominated by RFI, although for certain beams the most sig- 
nificant candidates are from RFI. 

The frequency bands to be zapped were selected from a 
database of candidates generated by the Cornell pulsar search 
pipeline Cordes et al. (2013). The basic idea is that if an ap- 
parently periodic signal appears in many different sky posi- 
tions (beams) at different observation times, it can not be a 
radio pulsar, but must be due to RFI. 

The Cornell candidate database contained 2 030 604 can- 
didates up to frequency 7.8125 kHz and over the complete 
range of trial DMs up to lOOOpccm"^. In the database, 
654 468 of these candidates had been flagged as arising from 
RFI. These candidates were binned in frequency bins of 
width ^ 3.7mHz. Frequency bins containing more than 200 
candidates were then broadened by a fractional amount of 
1.05 X 10~^ to account for Doppler shift in frequency arising 
from Earth's orbital motion. Overlapping frequency bands 
were then merged to obtain a set of non-overlapping bands, 
and frequency intervals of ±0.25 Hz around the first three har- 
monics of the power-line frequency (60, 120, and 180 Hz) 
were added. For the Einstein@Home search, the relevant part 
of the zap list is transmitted to the host along with the search 
executable. 

The zap list is a two-column table of lower and upper 
frequency values, and extends up to the Nyquist frequency 
3.906 25 kHz of the down-sampled data. The same zap list 
is used for all beams: it contains a total of 233 bands cover- 
ing 72.383 Hz, which represents 1.85% of the total bandwidth 
of 3.9 kHz. Figure 8 shows the total frequency bandwidth 
zapped as a function of the frequency. Note that some recent 
work has demonstrated that RFI at Arecibo is highly time- 
dependent, so using a fixed zap list is not optimal. In future 
Einstein@Home searches it may be beneficial to instead use 
dynamic beam-dependent zap lists. 

The Einstein@Home search client receives this zap list and 
replaces the amplitudes of the corresponding frequency bins 
in the whitened Fourier spectrum with zero-mean Gaussian 
noise whose real and imaginary parts have unit variance. Then 



the whitened and zapped Fourier amplitudes are inverse-FFTd 
to shift the data back to the time domain. After the inverse 
FFT the time series is cut back to its initial length T by re- 
moving the previously padded bins at the end. This data con- 
ditioning is done only once per de-dispersed time series when 
the science code is started. However if the code is restarted 
from a checkpoint, the data conditioning is repeated, since it 
takes just a fraction of a second; the whitened and zapped time 
series is not stored on the Einstein® Home hosts. 

Whitening is done before and not after zapping, because 
typical RFI corrupts at most a handful of bins and so does not 
significantly bias the median estimator used for the whitening 
normalization. 

(Ill) ORBITAL DEMODULATION 

The client search code now begins to step through the or- 
bital templates one-by-one. For each orbital template with 
orbital parameters A, the detection statistics Sl of Equa- 
tion (6) are computed on the full frequency grid with spacing 
A/ = 1/3T. 

The detection statistics can be efficiently computed in the 
frequency domain. To do this, the time- series is first re- 
sampled so that instead of being indexed by uniform steps 
of time t at the telescope, it is indexed by uniform steps of 
time t' at the binary system's bary center. This demodulation 
is done by replacing the /c'th sample of the time series at time 
t = kAt by the sample closest (nearest neighbor) to time 
t\t). The time coordinate at the binary system's bary center 
is defined by the condition $(t, A) = 27^/t^ The definition 
of the pulsar spin phase Equation (1) then implies 

= t^ ^ sm {Qorht + . (17) 

c 

Offsets in time t'{t = 0) ^ are dropped. They correspond 
to constant phase offsets ^o, on which the detection statistic 
in Equation (4) do not depend. 

This transformation means that the phase which appears in 
the exponential of the detection statistic Equation (4) becomes 
exp(— 27rm/t'). Then Equation (4) simply becomes a Fourier 
transform^^: the detection statistic Vn is the squared-modulus 
of the Fourier amplitude of the re- sampled time series in the 
n'th frequency bin. 

Before the re- sampled time series is FFTd to compute the 
detection statistics, it is padded with its mean value in the 
same way as described earlier: to a total time interval of 3T. 
This lessens the reduction of the detection statistic for putative 
pulsar signals with frequencies that do not fall exactly at the 
center of a Fourier frequency bin; the maximum loss is 8.8% 
(Knispel 2011). 

(IV) DETECTION STATISTIC COMPUTATION 

The client search code internally maintains five different 
candidate lists (called "toplists") corresponding to the detec- 
tion statistics through 5*4. Here "candidate" denotes the 
point in parameter space as well as the value of Si. The z'th 
toplist includes the 100 candidates with the largest values of 
Si having distinct values of fundamental frequency /. The 
toplists are initialized with null entries (Si = 0) and then up- 
dated as follows. 

The systems we search for have non-relativistic orbital velocities 
'i^orb /c <C 1, so the factor dt/dt' = 1 + 0('Uorb / c) that appears when chang- 
ing integration variables is close to unity and may be neglected. 
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After the time domain data have been demodulated for an 
orbital template and FFTd, five arrays are created, indexed 
by frequency /, which contain through S4. Note that 
these statistics are obtained by combining values of V for 
harmonically-related frequency bins. This "harmonic sum- 
ming" can be quite compute intensive, in part because it re- 
quires striding over widely separated parts of the frequency- 
domain arrays, summing elements. For computational effi- 
ciency, the number of required summations are minimized by 
re-using the Sl with smaller L to compute those with larger 
L (Lorimer & Kramer 2004). 

The array containing the detection statistic Si is then 
stepped through, element by element. If the statistic Si{f) 
is less than the smallest statistic currently on the i'th toplist, 
then the next element is considered. Otherwise, the toplist 
is searched to see if it contains a candidate at the same fun- 
damental frequency f. If not, the toplist candidate with the 
smallest detection statistic is replaced with the new, higher- 
statistic candidate. If so, then the candidate at the same fre- 
quency is replaced with the new candidate if and only if the 
new candidate has a larger value of the detection statistic than 
the existing candidate. This procedure ensures that the 100 
entries on each toplist are for 100 distinct frequencies. 

The comparison process required to insert new candidates 
in the toplist can be quite compute-intensive. To speed it up, 
the comparison is only carried out for values of the detection 
statistic that lie above a predefined threshold. The threshold 
is data-independent: it is the largest statistic value expected in 
Gaussian noise for the relevant number of "independent trials" 
(roughly speaking, this is the number of orbital templates x 
the number of frequency bins). Further details may be found 
in Section 4.10 

(V) RESULT FILES 

When the loop over orbital templates is finished, the search 
code computes the statistical significance Equation (9) of the 
500 candidates stored in the five toplists. These are then win- 
nowed further: the 100 candidates with the largest statistical 
significance are selected, sorted into canonical order, and re- 
turned to the Einstein@Home server in a single result file. The 
remaining 400 candidates are dropped. 

Each de-dispersed time series generates one result file. The 
result files are ASCII text in a fixed format, with 105 lines. 
Five of these lines contain "comment" information, such as an 
identifier for the volunteer and the computer that did the com- 
putation, the date that the computation was completed, and 
a marker for the end of the file. The other 100 lines contain 
information about the most significant 100 candidates win- 
nowed from the toplists. 

Each candidate line contains seven white- space- separated 
values: the spin frequency / in Hertz, the orbital period Porb in 
seconds, the projected orbital radius a sin(i) in light-seconds, 
the initial orbital phase ^ in radians, the detection statistic Sl, 
the statistical significance S defined by Equation (9), and the 
number of harmonics 2^. 

4.10. Thresholding and candidate selection 

As discussed above, candidates are only checked against 
toplist entries if their statistics Vn exceed certain thresholds. 
These thresholds are computed from a false-alarm probability, 
provided as command-line parameter to the search code. 

The false-alarm probability for each orbital template in any 
de-dispersed time series is set to = 0.08. For 6661 or- 
bitals templates and in pure Gaussian noise data, we expect 



6661 X 0.08 ^ 530 candidates to exceed this threshold af- 
ter all orbital templates have been searched. Thus, the search 
code should always return ^ 100 candidates for each Vn, af- 
ter searching the complete template bank, and fully populate 
all five toplists. 

For easy thresholding during runtime, the global false- 
alarm threshold p is converted into a single-FFT-bin false- 
alarm threshold Psingie and thresholds on the detection statis- 
tics P*. The probability of not having a false-alarm in Nf 
frequency bins in random Gaussian noise is 1 — p = (1 — 
Psingie)^^- From this, we findpsingie = 1 - (1 - pY^^^ • The 
detection statistic threshold P*, is determined indirectly by 
Psingie = Q2Ar(2P*), whcrc Q is the incomplete upper gamma 
function as in Equation (8). 

We compared these expectations, based on Gaussian noise, 
with results from real data, and were able to verify that the 
Einstein® Home search is not dominated by non-Gaussian 
noise. The returned candidates in a single de-dispersed time 
series typically have S > 8.5, unless strong pulsar or RFI 
signals are present. For most beams this is not the case. The 
number of total trials per de-dispersed time series (neglecting 
parameter correlations in the detection statistic) is the prod- 
uct of the number of frequency bins and the number of orbital 
templates iVtot = ^/ x ^tempi = 3 x 2^^ x 6661 10^°. 
Assuming that the number of candidates exceeding a particu- 
lar significance threshold follows binomial statistics, one ex- 
pects of order A/tot x 10"^-^ ^ 133 candidates with S > 8.2 
from noise alone. Indeed, the search code always reports 
>100 candidates, validating the assumption above. 

As described, each beam is analyzed with 628 different DM 
values for WAPP data and 3808 different DM values for Mock 
data, respectively. For each DM value, the 100 top candidates 
are returned. So the search procedure always returns 62 800 
"candidates" or 380 800 "candidates" per beam, respectively, 
regardless of whether RFI is present or absent in the beam. 
Moreover, the 100 candidates for each DM value are at dis- 
tinct frequencies. This makes it harder for RFI to dominate 
the candidates for a given beam. 

There is a consensus among radio astronomers that RFI has 
become more severe in the past decade, probably due to the 
proliferation of wireless devices such as cellphones and WiFi. 
Nevertheless, the procedures we have described are reason- 
ably effective in mitigating the effects of this RFI. The Ein- 
stein® Home search is not dominated by non-Gaussian noise, 
in the sense that a typical beam returns statistic values in 
the expected ranges for Gaussian noise. Of course there are 
beams which contain strong RFI or strong pulsars for which 
this is not the case. 

If one looks across the entire search (not beam-by-beam) 
the top 1% of candidates are not consistent with Gaussian 
noise: these arise from pulsars or RFI. However if one looks 
further down the list, the distribution of statistic values are 
reasonably consistent with Gaussian noise. In fact the situa- 
tion is similar for the Pulsar Exploration and Search Toolkit 
(PRESTO) processing pipeline, which is also used to process 
PALFA data. In that pipeline, for each beam, the 200 strongest 
candidates are followed-up (folded and refined). For beams 
that are strongly affected by RFI, most or all of these candi- 
dates are not consistent with Gaussian noise. However for the 
majority of beams, the bulk of candidates are consistent with 
Gaussian noise. 

4. 1 1 . Client search code checkpointing 
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The search execution on the host may stop for many rea- 
sons. For example the volunteer might turn off the computer, 
or the BOINC client might stop execution because it appears 
that the volunteer is busy using the computer for other pur- 
poses. 

As described in Section 2.6 the client search code check- 
points on a regular basis, by default once per minute. This 
checkpointing saves the internal state of the search, and per- 
mits it to be efficiently restarted with very little computing 
time lost. The checkpointing is done by sorting and saving 
the toplist files, and then saving a counter which records the 
last orbital template that was completed. 

When the search is started (or restarted) it carries out the 
whitening and zapping steps on the input data, and then 
checks if a valid checkpoint file exists. If not, the search be- 
gins execution at the first orbital template as previously de- 
scribed. However if a valid checkpoint file is found, then the 
toplists are initialized from the stored values, and the loop 
over orbital templates begins following the orbital template 
index recorded in the checkpoint file. 

4.12. CPU Implementation of the Search Algorithm 

The search algorithm is implemented in the C program- 
ming language. Mathematical functions are provided by the 
standard C math library with special functions from the GNU 
Scientific Library (GSL, Galassi et al. (2009)) and FFT rou- 
tines from the Fastest Fourier Transform in the West (FFTW, 
Frigo & Johnson (2010)) library. The search code is then 
wrapped into the BOINC framework (Anderson et al. 2006) 
as described earlier. The implementation is single-threaded, 
i.e. hosts simultaneously execute one instance on each CPU 
core that BOINC allocates to the search. 

To produce executable binaries, the Linux applications are 
compiled using standard GNU tools. The applications for 
Mac OS X are built using the Mac OS X 10.4 SDK build en- 
vironment. For Windows, the applications are cross-compiled 
on Linux machines using the MinGW tools"^^. The underly- 
ing compiler in all three cases is the GNU C Compiler; this 
permits identical optimizations and execution ordering on all 
platforms. 

4.13. Einstein@Home Processing speed / Throughput 

The speed with which Einstein@Home can process one 
beam of PALFA data is determined by the amount of com- 
puting time required for a single workunit and the number 
of workunits per observed beam. These have varied over the 
years as the processing code was made more efficient; the 
number of participating volunteers has also varied. 

Individual workunits should not take too long to run on a 
host: volunteers become discouraged if the results of their 
processing do not quickly lead to successful results and visi- 
ble computing credits. The workunits also should not be too 
short, or the Einstein@ Home database gets too large to oper- 
ate efficiently, and the overhead of uploads, downloads, and 
sending new workunits to hosts becomes excessive. In gen- 
eral our goal has been to have workunit run-times of between 
one hour and one day. As the application code became faster, 
we achieved this by bundling multiple single workunits into 
larger ones: the runtime has remained between one hour and 
one day for the majority of the hosts, the lower end populated 
by the GPUs. 

http : / / www . mingw . org/ 



The first implementation of the Einstein@Home search ran 
from 2009 March to 2010 February and processed on aver- 
age ^ 25 WAPP beams each day. After that, two major 
code improvements increased the processing speed by a fac- 
tor of ~ 6, and between 2010 February and 2010 August, 
Einstein@ Home processed ?^ 160 WAPP beams per day. The 
first GPU version of the search code increased the processing 
rate to more than 300 beams per day between 2010 September 
and 2010 December. 

The Einstein® Home search of the Mock spectrometer data 
started in 2011 July and processed on average ^ 50 beams 
per day until 2012 September. After that date, the process- 
ing rate gradually increased over a period of three months and 
has been running at around 160 beams per day since the end 
of 2012. As of February 2013, the majority of the Mock data 
(see Table 1) has been analyzed, and the data processing back- 
log is less than two months. 

4.14. GPU Implementation of the Search Algorithm 

As previously described, Einstein® Home also takes advan- 
tage of the Graphics Processor Units (GPUs) available on 
a substantial fraction of host machines, providing applica- 
tions for NVIDIA GPUs which support CUDA version 3.2 
or higher, and for AMD/ATI GPUs which support OpenCL 
version 1 . 1 or higher. CUDA and OpenCL are programming 
models, API interfaces, and support libraries which enable 
GPUs to be used for scientific computation. 

The supported GPUs typically execute double-precision 
floating point operations very slowly compared to single- 
precision operations, or do not support them at all. So the 
CPU codes were designed so that all floating-point operations 
can be performed in single precision. Tests with simulated 
pulsar signals were performed to ensure that this does not de- 
grade the sensitivity of the search. 

The code was also designed to have a reasonably small 
memory "footprint", particularly because of limits imposed 
by consumer-grade graphics cards. The GPU version requires 
less than 250 MB of GPU memory, which substantially en- 
larges the set of GPU cards on which the code can run. 

The overall structure of the GPU code is similar to that 
of the CPU version (see Figure 7), with the most compute- 
intensive analysis offloaded to the GPU. These are the time- 
series re- sampling to remove the effects of orbital motion via 
demodulation, the FFT and power spectrum computations, 
and the harmonic- summing to obtain the Sl from the Vi. For 
NVIDIA GPUs, the CUDA 3.2 programming framework 
was used to embed calls to CUDA-C code (kernels) executing 
on the GPU. On AMD/ATI GPUs, the OpenCL programming 
framework was used for the same purpose. 

To maximize GPU utilization, the GPU implementation of 
the time- series re- sampling is split into five CUDA kernels 
to maximize thread parallelization. To avoid the overhead 
of memory transfers to the host CPU, intermediate output is 
kept in GPU memory as much as possible. The time-offsets 
t — t' needed for re-sampling are computed in parallel, using a 
lookup table and interpolation to avoid costly sine/cosine op- 
erations. An identical lookup table is pre-computed for both 
CPU and GPU hosts to help ensure that their results cross- 
validate later in the processing pipeline (see Sec. 4.9). We 
use intrinsic functions to avoid generating fused multiply- 

https://developer.nvidia.com/ 
cuda- toolkit- 32 -downloads 
http : / / www . khronos . org 
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add instructions that could introduce rounding errors which 
would also hamper cross-validation. The length of the mod- 
ulated time series is computed in a separate kernel, and the 
re-sampling itself is done by yet another kernel, using the 
time-offsets and the time-series length computed in the pre- 
viously. Each time- series sample is computed in parallel by a 
separate GPU thread. A parallel sum-reduction algorithm is 
then used to compute the mean of the re-sampled time-series, 
and a final CUDA kernel implements the mean-padding of the 
re-sampled time-series. 

To perform FFTs efficiently on NVIDIA GPUs, the 
NVIDIA CUFFT 3.2 library"^^ is used. The CUFFT library has 
an FFTW compatibility mode, which simplified development 
and integration with the CPU code. A custom CUDA kernel 
is used to compute the power spectrum in parallel from the 
FFT output. Intermediate as well as the final output (for the 
next step) is again kept in GPU memory. 

The GPU implementation of harmonic summing differs 
from the CPU version: the GPU version re-orders the com- 
putations so that hundreds of processing cores on the GPU 
can independently perform calculations in parallel. Memory 
caching is needed, because of the low locality and irregu- 
lar access strides associated with summing the different har- 
monics of /. Caching is done in texture memory; without 
it the memory access pattern of the individual threads would 
be very inefficient. Write accesses have been eliminated, ex- 
cept for those associated with the (comparatively rare) signals 
that might make it onto the candidate toplist, i.e. detection 
statistics exceeding the false-alarm threshold and the weakest 
toplist signal. 

GPU versions of the host applications are provided for 
Linux, Windows, and Mac OS X operating systems. The Win- 
dows version is cross-compiled under Linux for the same rea- 
son as described in Sec. 4. 12: to improve cross-platform result 
validation. This also allows for a tighter integration in the au- 
tomated build system of Einstein® Home, but adds complex- 
ity because cross-compilation requires the use of the lower- 
level CUDA driver API instead of the higher-level CUDA run- 
time API. 

The OpenCL implementation differs somewhat from the 
CUDA one. The FFT library is derived from software devel- 
oped by Apple"^"^. As provided, the Apple library can only do 
complex-to-complex FFTs of arrays whose length is a power- 
of-two (2^); we extended it to efficiently do real-to-complex 
transforms of length 3 x 2"^, as required by the search code. It 
was also modified to eliminate calls that approximate trigono- 
metric functions with different accuracy on different GPUs. 
This reduces the numerical difference between different GPU 
models, making the results more hardware-independent and 
simplifying result cross-validation. 

OpenCL is a vendor-independent framework and the 
OpenCL application also runs on NVIDIA graphics cards that 
support OpenCL 1.1. Somewhat surprisingly, we found better 
numerical agreement between the OpenCL application run- 
ning on ATI/AMD GPUs and the CUDA application running 
on NVIDIA cards, than between the (same!) OpenCL appli- 
cation running on both ATI/AMD and NVIDIA GPUs. 

The GPU version of the search application evolved consid- 
erably over time, by incrementally porting more steps of the 
main loop to code executing on the GPU. The first GPU ver- 

https : / /developer . nvidia . com/cuf f t 

http : //developer . apple . com/library/mac/ 
#samplecode/OpenCL_FFT 
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Table 3 

Comparison of run-times for the CPU (using only one core) and GPU 
versions of the client search application, processing a single de-dispersed 
time-series through a template bank containing 6662 orbital templates. The 

different rows show the execution time spent in the different functional 
blocks of Figure 7. The absolute run- times vary considerably for different 

combinations of CPU and GPU models; we measured it for typical 
consumer-grade hardware. The CPU is an Intel Core 2 Q8200 (2.33GHz), 
the CUDA GPU is a NVIDIA GTX 560 Ti and the OpenCL GPU is an 
AMD Radeon HD 7970 (all running on unloaded Linux systems). 

sion of the search appHcation only implemented the FFT step 
on the GPU, and was limited to a speed-up of between 2 and 
3 compared to the CPU version, because on the CPU version 
the FFT step consumes almost two-thirds of the total CPU 
run time. The next important step was to port the re- sampling 
code to the GPU. This gave an overall speed-up of about 4 
compared to the CPU version, and left the harmonic-summing 
step dominating the run time. When the harmonic summing 
step was also ported to the GPU, the overall speed-up factor 
reached 50 (and even higher on some CPU and GPU com- 
binations). Table 3 shows typical run time examples for the 
current GPU and CPU version of the client search application 
and the relative fraction of time spent in each processing step. 

Running one instance of the application, a typical high-end 
NVIDIA GPU (for example the GTX 560) achieves up to 85% 
utilization"^^. Provided that the GPU has sufficient memory, 
BOINC can run two or three instances in parallel. This satu- 
rates the GPU, achieving more than 98% utilization! 

4.15. Validation 

As described earlier, any result file uploaded to the Ein- 
stein® Home servers must be validated because it could be 
partially or completely incorrect, and/or corrupted. Validation 
is done on the Einstein@Home server, by comparing the re- 
sult file to another result file for the same workunit, generated 
on another host. An automatic validator compares results and 
rejects those that appear to be corrupted and/or inconsistent 
with other results. 

The validation process is not trivial; it can not be based on a 
simple binary comparison of the two result files, because the 
use of different floating-point libraries, compiler instructions, 
and hardware can lead to numerical differences in the results. 
Thus, results from two different hosts might both be correct, 
but not binary identical. So the comparison process must al- 
low for numerical differences at a level which is typically of 
the order one part in 10^. 

For Einstein® Home, the validation process operates in two 
steps. The first step checks a single result file for syntax and 
internal consistency. The second step compares two (or if nec- 
essary, more than two) results which have passed the first step 
against one another. Most incorrect or invalid results are de- 

The utilization is reported by NVIDIA's System Management Inter- 
face nvidia-smi; information may be found at https : //developer . 
nvidia . com/nvidia- system-management- interface 



Einstein@Home and PSR J2007+2722 



21 



tected in the first step. 

In the first "syntax and consistency" step, a result file is 
checked to see if it has the fixed seven-column output for- 
mat with 100 lines described near the end of Section 4.9. For 
each line, the seven fields are individually checked to con- 
firm that they are valid numbers and lie in pre-defined ranges. 
The overall ordering of lines within the file is also checked 
to confirm that they are ordered by decreasing significance. 
If any of these checks fails, then the result is marked invalid, 
and another copy of the corresponding workunit is generated 
on the Einstein® Home server sent to a different volunteer's 
computer. Slightly less than 1% of results fail to validate at 
this stage^^. 

In the second step, two or more result files that have passed 
the first step are are pairwise-checked for mutual consistency. 
The validator tries to match each line from one result file to 
a line in the other. Two lines "match" when the individual 
values for DM, /, the orbital parameters, the S'l, and S agree 
within less than a fractional error of 10~^. The number of 
harmonics 2^ must match exactly. 

The last lines in the result files are typically near the noise 
threshold and because of differences in floating-point accu- 
racy and rounding on different hosts, they may not correspond 
to the same candidates. Thus the validator permits unmatched 
lines in the result files if (within fractional error 10~^) the 
corresponding candidate might not have appeared in the most 
significant 100 results in the other result file. 

If two results both pass the "syntax and consistency" step, 
but are found to be inconsistent, another instance of the work 
is generated and sent to a different client machine. The pro- 
cess of generating further instances of the results is repeated 
until a consistent set are found, containing two or more re- 
sults. Those results that are inconsistent with that set are 
marked as invalid; slightly less than 0.5% of results fail vali- 
dation in this way^^. If more than twenty results are generated 
without getting a match, then warning messages are sent to 
project personnel, and the workunit "errors out". 

4.16. Post-processing 

The client search code identifies the 100 most statistically 
significant signal candidates in 628 de-dispersed WAPP (3808 
Mock) time series for each telescope beam. Ideally, all signif- 
icant candidates should be followed up using the "raw" obser- 
vational data with the full time resolution. In practice, this is 
neither computationally feasible nor necessary, because a real 
pulsar can be detected at different DMs, frequencies, and in 
multiple orbital templates. 

Several different sifting methods are used to reduce the 
number of candidates to follow up. These include overview 
plots for the inspection all candidates in a given beam (de- 
scribed below) and an automated filtering routine, summa- 
rized here and described in detail in Knispel et al. (2013). 

When valid result files (see previous section) for all de- 
dispersed time series of a given beam are available on the 
Einstein @ Home upload servers, a set of overview plots is au- 
tomatically produced for visual inspection. Theses show all 
candidates in a given beam in the multi-dimensional parame- 
ter space of DM, spin frequency, and orbital parameters, pro- 

See "Validate error rate" at http://einstein6. 
aei . uni-hannover . de/EinsteinAtHome/ download/ 
BRP -progress/. 

'^'^ See "Invalid result rate" at http : //einstein6 . 
aei . uni-hannover . de/EinsteinAtHome/download/ 
BRP-progress/. 



jected into two and three dimensions. Pulsars are identified 
by the characteristic patterns they produce. 

A combination of five different plots are used in post- 
processing. As an example. Figure 9 shows the highly- 
significant detection of PSR J2007+2722 in the Ein- 
stein® Home results. For each candidate the left-hand panel 
shows the significance 5 as a function of the trial DM number 
and spin frequency. The right-hand panel shows four projec- 
tions into subspaces of the parameters. These help identify 
pulsar candidates and provide initial estimates of spin and or- 
bital parameters. 

These plots use coordinates defined in the Appendix of 
Knispel et al. (2013). They are obtained from writing the 
phase model (1) as a power Taylor series in t. Then, the coef- 
ficient of the linear term z/i = / (1 + a sin(i)r^orb cos (-0) /c) 
identifies a spin frequency. The coefficient of the quadratic 
term U2 = —a sm{i)Ql^^f sin {ip) / (2c) is proportional to the 
Doppler spin-down or spin-up. 

Promising candidates are identified from the visual inspec- 
tion of these plots. The number of promising candidates is 
relatively small. The majority of PALFA beams have none; 
the most promising beams have at most a handful. 

In the next step, PRESTO software tools are used to fold 
the full-resolution filterbank data for all candidates, starting 
with the spin-period and DM values identified from the Ein- 
stein® Home results. The PREPFOLD plots are inspected by 
eye and used to judge the broadband nature and temporal con- 
tinuity of the signal. The ATNF catalogue (Manchester et al. 
2005), and webpages listing known pulsars"^^ are checked to 
ensure that the candidate is not a detection of a known pulsar. 

We also developed an automated routine which filters 
through the list of all candidates for a given beam and re- 
turns the most promising candidates. These candidates are 
then followed up automatically with different software tools 
described in detail in Knispel et al. (2013). The automated 
routine consolidates candidates at harmonically related fre- 
quencies, neighboring DMs, and similar orbital parameters. 
The remaining candidates are folded with PREPFOLD to pro- 
duce folded pulse profile and other diagnostic plots, as well 
as associated ASCII files. These are then filtered by a second 
piece of software, which uses these plots and ASCII files to 
select the most "pulsar-like" candidates (Knispel et al. 2013). 

In the Einstein@Home search of the PALFA WAPP data 
no previously unknown pulsars were identified with the 
automated search routine. Both PSR J2007+2722 and 
PSR J 1952+2630 were found through visual inspection of the 
overview plots. 

5. DISCOVERY OF PSR J2007+2722 

PSR J2007+2722 was found by project scientists on 1 1 July 
2010 as part of the routine post-processing described in the 
previous section; the corresponding data had been acquired at 
Arecibo on 11 February 2007. In the post-processing plots 
(Figure 9) the pulsar appeared with maximum significance 
S = 169.7 at a dispersion measure DM=127pccm~^ and 
spin frequency of 40.821 Hz. The orbital parameters at high- 
est significance were consistent with no orbital modulation, 
or with an orbital period longer than the longest orbital pe- 
riod in the template bank. In other words, it appeared that 

http: //www. as .wvu.edu/~pulsar/GBTdrift350/, 
http: //www. physics .mcgill . ca/ ~hessels/GBT350/ 
gbt350 .html, http: //astro. phys .wvu.edu/dmb/, 

http : / / www . naic . edu/ ~palf a/ newpulsars/ 
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Figure 9. Example post-processing overview plots, showing the highly-significant detection of PSR J2007+2722. (Left) This plot shows the significance <S as a 
function of the DM trial number and the fiducial spin frequency (see Sec. 4.16) of each candidate. The color-code displays the relative change in fiducial spin 
frequency ^2/1^1 from orbital motion. Since the top 100 candidates are reported for each DM trial and the pulsar is detected with very high significance, there 
are no detections of the noise floor in a DM range around the pulsar. (Right) The four sub-panels show the significance «S as a function of different combinations 
of spin frequency and the orbital parameters. 



the pulsar was either isolated, or was in a long-period binary 
system. Further PRESTO-based analysis (Ransom 2002) re- 
fined these values and supported the isolated or long-period 
interpretation. 

The discovery was confirmed with a short Green Bank Tele- 
scope (GBT) observation soon thereafter, following which the 
pulsar was (re)observed at Arecibo, Jodrell Bank and Effels- 
berg. Details of later GBT studies are given in Section 6.2. 
A full timing analysis based on dozens of additional observa- 
tions extending to late-2012 is given in Section 6.3. 

Because the project database, and the result files them- 
selves, contain information about the computers that carry out 
analysis, it is straightforward to identify the volunteers whose 
computers provide any particular result. As described in the 
Section 4.15 on validation, all Einstein@Home work is sent to 
computers owned by at least two different volunteers. In this 
case, the valid result files containing the statistics of highest 
significance for PSR J2007+2722 were returned by comput- 
ers owned by volunteers from Ames, Iowa, USA and from 
Mainz, Germany. 

The US volunteers were Chris and Helen Colvin. For secu- 
rity reasons, the Colvins are not allowed to use their "work" 
computers for personal email and web browsing, so they 
maintain a small mail and web server at home. Since 2006, 
this home computer has been running Einstein@Home as a 
background job. The machine was equipped with an NVIDIA 
graphics card, whose GPU did the "discovery" processing. 

The German volunteer was Daniel Gebhardt, who is the 
system administrator for a Musikinformatik group at Univer- 
sitast Mainz. Gebhard runs a mail server for the group, which 
is continuously powered up, and runs Einstein@Home as a 
background task. 

It is notable that the first discovery from the Ein- 
stein® Home pipeline, which was designed to find pulsars in 
binary systems, was an isolated pulsar, which was not found 
in either of the other PALFA processing pipelines. This is 
not unexpected: as described previously, the Einstein@Home 
search pipeline contains long-orbital period templates and one 
template with infinite period, so it can detect isolated systems. 
But why was it not found by the other pipelines? 



In fact this is not surprising: the three pipelines in ques- 
tion (Einstein@Home, PRESTO and Cornell) produce statis- 
tical outlier candidate signals that are different owing to re- 
sampling differences, to differences in the way orbital mo- 
tion is treated, and to the way signals that exceed statisti- 
cal thresholds are reported. In total, each of these pipelines 
has involved about 10^^ statistical tests so far, and the ini- 
tial reduced set of candidate signals is in the millions. The 
three pipelines have different procedures and criteria for fur- 
ther winnowing these candidate signals into much shorter lists 
of viable pulsar candidates worthy of detailed visual inspec- 
tion and follow-up observations at the telescope. The three 
pipelines also process the data in different order, and at the 
time of the PSR J2007+2722 discovery, all three had data 
backlogs: the fact of the matter is that the E@H pipeline 
found PSR J2007+27 first. Retrospectively, the pulsar could 
be seen in the output of one other pipeline (i.e. when we 
knew what to look for), while the other pipeline had not yet 
processed the relevant beam. In just the same way, the other 
pipelines have also found new pulsars that the E@H pipeline 
subsequently also detected. 

DISTANCE TO PSR J2007+2722 

Based on the NE2001 model (Cordes & Lazio 2002) for 
the Galactic distribution of free electrons, the DM=127 dz 0.4 
value implies a distance of 5.4 kpc. The uncertainty in dis- 
tance arising from the 0.4 pc cm~^ error in DM is negligible 
in comparison with the NE2001 model uncertainty. We know 
of two ways to bound this model uncertainty. 

A direct measurement of errors in the NE2001 model can 
be obtained from comparisons of NE2001 distance estimates 
to actual parallax-based distance measurements (see Chatter- 
jee et al. (2009, and references therein)). While direct com- 
parisons are only possible for objects significantly closer than 
J2007+2722, for objects within 10° of the pulsar, the paral- 
lax and DM distances agree to within 20%. Thus this direct 
measurement would suggests errors of less than 20% in the 
5.4 kpc distance estimate. 

To indirectly estimate the NE2001 model uncertainty, we 
first need to identify if an HII region or void perturbs the elec- 
tron density along the line of sight, which would increase this 
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uncertainty. In the case of PSR J2007+2722, we could not 
identify any specific HII region or source of radio recombi- 
nation along the line of sight. The closest young star cluster 
on the sky is IRAS 20050+2720, about 20 arcmin away from 
the line of sight and ~ 0.7 kpc distant from Earth. IRAS 
20050+2720 has no massive stars that could produce a de- 
tectable HII region (Giinther et al. 2012). IRAS and 5-GHz 
VLA images also do not show any extended emission near 
the line of sight. Thus, we estimate the NE2001 model un- 
certainties following the approach given in Section 4.2 and 
Figure 12 of Cordes & Lazio (2002). We assume that the DM 
is perturbed by subtle departures from the model at the level 
of ADM = 10 and 20 pc cm~^. These alter the inferred 
distance by ±0.3 and 0.6 kpc, respectively, corresponding to 
~ 6% and 11% errors, or a maximum total error of 17%. 

Choosing the worst case, we conservatively estimate the 
distance error to be less than 20%, and conclude that the dis- 
tance of PSR J2007+2722 is 5.4 zb 1.1 kpc. 

6. FOLLOWUP OBSERVATIONS AND CHARACTERIZATION OF 
PSR J2007+2722 

6.1. Accurate determination of the sky position 

GRIDDING OBSERVATIONS WITH THE ARECIBO TELESCOPE 

The initial discovery of PSR J2007+2722 determined the 
sky position within about 2 arcmin: the Arecibo beam radius 
at 1 .4 GHz. In normal circumstances, one determines pulsar 
positions more precisely using timing measurements over a 
period of a year or longer. Carefully fitting pulse arrival times 
to a timing model makes it possible to determine the sky po- 
sition with an angular error ^7 ~ eP/D = 3 x 10~^ radians, 
where e ^ 10~ is the typical time-of-arrival (TO A) error, 
measured as a fraction of the rotation phase, P = 1/f = 
25 ms is the pulsar period, and D 10^ s is the light travel 
time across the diameter of the Earth's orbit. This corresponds 
to a position error ^ 6 milliarcsec; a timing-model po- 
sition determination to such accuracy can be found in Sec- 
tion 6.3 

However, the discovery of PSR J2007+2722 was an im- 
portant milestone for Volunteer Distributed Computing, and 
waiting a year to precisely determine the sky position using 
timing was not an option. Nevertheless, we were able to nar- 
row down the sky position using a combination of methods, 
in order to search for associated X-ray or gamma-ray sources 
and to set a limit on the magnitude of the spin-down P. (If 
the TOA measurements cover much less than one year, then 
uncertainties/errors in sky position are degenerate with uncer- 
tainties/errors in P.) 

The first step in determining the sky position of 
PSR J2007+2722 more precisely was with a set of "grid- 
ding" measurements using the Arecibo telescope on 2010-07- 
19 (project p2557). The observations were done in S-band us- 
ing the Mock spectrometers to construct five 172 MHz bands 
(center frequencies 2136, 2308, 2687, 2859, and 3013 MHz) 
with 1024 channels per band and a 65.5 /iS sampling time. A 
filter at the upper end of the S-band receiver bandwidth (band- 
passes at 2040-2400MHZ and 2600-3 lOOMHz) was used to 
minimize RFI and reduce the half-power beam width to 2 ar- 
cmin. 

The results of these first gridding measurements are shown 
in Table 4. A square grid of 9 pointings was used, with the 
center at RA 20h07ml4s DEC 27°24'26'', and the adjacent 
pointings offset by about ±1 arcmin (±4 s in RA and ±1' 
in DEC); the half-power beam contours overlapped by about 
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Table 4 

Arecibo gridding measurements used to refine the sky position of 
PSR J2007+2722. The pulsar was visible in five of the nine pointings; the 
table entries show the ratio of the folded profile peak to the rms noise floor. 

7 arcsec in RA. As shown in the table, the pulsar was detected 
in five of the nine pointings. A weighted average of the two 
pointings with the largest SNRs gave a position estimate RA 
20h07ml2.7s, DEC 27°23'26''. We were confident that the 
pulsar was inside a 1 arcmin radius circle about this point. A 
weighted average of all five pointings gives a position esti- 
mate differing by about 25 arcsec, but was considered biased 
because there were no observations to the South of the bright- 
est grid point. 

OBSERVATIONS WITH WESTERBORK SYNTHESIS RADIO 
TELESCOPE 

To further refine the sky position, observations were made 
with Westerbork Synthesis Radio Telescope (WSRT, Nether- 
lands) at central frequency 1380 MHz with a 160 MHz band- 
width. WSRT is a linear array of 14 circular radio anten- 
nas, each 25m in diameter, arranged on a 2.7 km East- West 
line. Aperture synthesis creates a fan-beam approximately 
12'' X 30' in size, with the long axes along the North-South 
direction at transit. On the evening of 2010-07-19, ten 1 180 s 
observations were made, with the center of each observation 
displaced by 12", as schematically shown in Figure 10. These 
covered the uncertainty region obtained from the Arecibo 
gridding observations. For each WSRT observation, the data 
were de-dispersed and folded with the PSR J2007+2722 pe- 
riod and DM using PuMa-II (Kamppusamy et al. 2008), a 
high time-resolution coherent de-dispersion pulsar-processing 
back-end. We believe that this was the first time that WSRT 
has been used for pulsar position refinement in this way. 

The pulse profile was only convincingly detected in con- 
tiguous beams 7 and 8, with respective SNRs 25 and 20, 
as shown in Figure 10. Weighted overlapping of fan beams 
7 and 8 yields a position-constraint ellipse centered at RA 
20h07ml4.5s, DEC 27°23'36" as shown in Figure 11. The 
major and minor radii are 51 arcsec and 7 arcsec; the major 
axis is rotated 20° clockwise from North. 

WESTERBORK IMAGING AND NVSS CATALOG SOURCES 

Simultaneously with pulsar data, WSRT imaging data were 
also acquired. Shown in Figure 1 1 is the radio image, along 
with the error ellipse just described. A single radio source is 
visible on the southern side of the error ellipse, just within 
the position circle obtained from the Arecibo gridding. This 
source is also listed in the 1.4-GHz National Radio Astro- 
nomical Observatory (NRAO) Very Large Array (VLA) Sky 
Survey (NVSS) catalog (Condon et al. 1998). The cat- 
aloged source NVSS 200715+272243 has coordinates RA 
20h07ml5.86s, DEC 27°22'43.5'', a cataloged flux density of 
2.3mJy at 1400 MHz, and an estimated size less than 3.3 arc- 
sec, consistent with the WSRT image. 

VLA DATA ARCHIVE 
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Figure 10. Right: schematic illustration of ten WSRT fan beams overlapped 
with the 1 arcmin-radius error circle obtained from gridding observations at 
the Arecibo observatory. The fan beam ellipses are not to scale: the minor 
axis is correct but the major axis is much longer than shown here. Left: 
folded pulse-profiles for fan beams 7 and 8. The horizontal axis is pulse 
phase; the vertical axes shows normalized flux. PSR 12007+2722 was not 
detected in any other fan beam. 




Figure 11. An image of the WSRT data, along with the error circle obtained 
from Arecibo gridding, and the WSRT error region obtained by overlapping 
fan beams 7 and 8 as described in the text. The imaged source corresponds 
to a cataloged NVSS source and is PSR J2007+2722. 

It was possible to determine the position even more pre- 
cisely from archival data. This part of the sky contains the 
young star cluster IRAS 20050+2720 (Giinther et al. 2011). 
The VLA data archive contains a 1610-s on-source observa- 
tion of IRAS2005 taken on 1997-08-14 (VLA project code 
AE0112A, data-set VLA_XH97065_file6.dat); the Field of 
View (FOV) is approximately 16 x 16 arcmin. The data 
were acquired with the VLA C array operating in a 50- 
MHz bandwidth centered at 4.8601 GHz, in full Stokes 
mode, with a central beam position RA 20h07m05.859s, DEC 
27°28'59.7r. 

We analyzed the full FOV using MAXFIT to characterize 
the 8 point sources and one extended source which are visible 
above the background noise. Shown in Figure 12 is the part 
of this data (about 10 x 14 arcmin) containing the sources, 
which are circled. 

As can be clearly seen in Figure 12, only one of these 
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Figure 12. An image of (a part of the) archival VLA data at 4.8 GHz. To 
compensate for the drop-off in sensitivity near the edge of the primary beam, 
the intensity has been divided by a model for primary beam response. The in- 
tensity has an rms of 42 /iJy; there are 9 sources brighter than 170 /iJy, which 
are shown in the dashed circles. The source fluxes determined by MAXFIT 
are given in mJy, before dividing by the beam response; the extended source 
is indicated by "E". The larger circle is the 1 -arcmin-radius source uncer- 
tainty region found by the Arecibo gridding and the region between the two 
"parallel lines" is the relevant portion of the uncertainty ellipse found by the 
WSRT gridding. There is only one source (PSR J2007-h2722) lying in both 
uncertainty regions; it has a unnormalized flux of 210 /iJy and a normalized 
flux of 1.2mJy. 

(point) sources lies inside the uncertainty regions obtained 
from the WSRT and Arecibo observations. This is shown 
in more detail in Figure 13. The point source has coordi- 
nates RA 20h07ml5.77s, Dec 27°22'47.68'' and an uncor- 
rected flux density of 0.21 mJy; the primary beam-corrected 
flux density is 1.2 mJy (±10%) at 4.86 GHz. (The absolute 
flux density measurement is referred to 3C48; the errors arise 
primarily from uncertainties in the primary beam model, be- 
cause the source is close to the edge of the beam.) The flux 
density is consistent with the normal spectral behavior of sim- 
ilar radio pulsars; we conclude that this is the correct location 
of PSR J2007+2722. 

6.2. Multi-frequency Observations and Emission Geometry 
OBSERVATIONS AT 327 AND 430 MHZ 

PSR J2007+2722 was not detectable with the Arecibo tele- 
scope at 327 or 430 MHz. Further observations will be at- 
tempted in April 2013. 

GREEN BANK TELESCOPE OBSERVATIONS 

The Green Bank Telescope (GBT) carried out additional 
follow-up observations on 2010 July 21, in bands centered 
at 820, 1500, 2000, and 8900 MHz. Fufl Stokes data were ob- 
tained for the observations at 1500, 2000, and 8900 MHz, but 
the 8900 MHz data was too noisy to be useful for polarimetry. 

All GBT observations of PSR J2007+2722 were carried 
out using the Green Bank Ultimate Pulsar Processing Instru- 
ment (GUPPI) in incoherent de-dispersion mode. The 
observations at 820 MHz used 200 MHz total bandwidth, 
2048 spectral channels and 40.96 /iS time resolution. For the 
1500 MHz and 2000 MHz observations, 800 MHz total band- 
width, 2048 channels and 25.6 /is time resolution were used. 
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Figure 13. Archival VLA data at 4.8 GHz. Top: a 180 x 144^^ region 
with the r -radius uncertainty region from Arecibo gridding and the 13"-wide 
uncertainty region from WSRT gridding; the overlap contains a single VLA 
source (small circle). Middle: a 30 x 24'' zoom showing the 1.2 mJy VLA 
source near the south side of the previous region. The 1 "-radius circle shows 
the uncertainty region obtained by fitting a Gaussian to the VLA intensity; the 
discovery publication used this as the PSR J2007-I-2722 position. Bottom: a 
5 X 4'' zoom; the cross indicates the location of PSR J2007-K2722 obtained 
in this paper by timing analysis. It lies inside the l"-radius VLA uncertainty 
region. The intensity scale has been changed in the bottom plot to show the 
brightest VLA pixels. 

At 8900 MHz, the parameters were 512 channels, 800 MHz 
bandwidth and 6.4 /is time resolution. 

The total observation time at each frequency was approx- 
imately 30 minutes. Along with each pulsar observation, a 
short amount of data were recorded with the local calibration- 



Table 5 

Measured flux density of PSR J2007 -1-2722 at different radio frequencies. 



Frequency (MHz) 


Flux Density (mJy) 


Pulsed/Total? 


Instrument 


820 


1.6 


P 


GBT 


1400 


2.3 


T 


NVSS Catalog 


1500 


2.1 


P 


GBT 


2000 


1.7 


P 


GBT 


4860 


1.2 


T 


VLA Archive 


8900 


0.3 


P 


GBT 



The pulsed measurements (P) only show the component of the flux density 
that varies with pulse phase. The total measurements (T) also include the 
phase-independent part. The methods used to determine the VLA/NVSS 
fluxes are described in the section about position determination; the meth- 
ods used to determine the GBT are described in the text. 



noise source pulsed at 25 Hz. The equivalent noise source flux 
in each polarization channel was determined by observing 
standard astronomical flux calibration sources (3C190 was 
used at 820, 1500, and 2000 MHz; 3C48 at 8900 MHz). The 
noise source measurements were then used for polarimetric 
calibration (differential gain and phase) and absolute flux cal- 
ibration of the pulsar data. All data processing described in 
this section was performed using the PSRCHIVE^^ software 
package (Hotan et al. 2004). 

The low flux density at 820 MHz and non-detections at 
lower frequencies are unusual, as pulsar spectra generally 
turn over at frequencies around ~100 MHz. Thus, PSR 
J2007+2722 may belong to a small subset of pulsars with 
GHz-peaked spectra. Kijak et al. (2011) suggest that this 
behavior could be due to unusual environments, since PSR 
B 1259-63 exhibits such a spectrum at periastron. However, 
more such objects are necessary to draw any reliable conclu- 
sions. While only 5 such sources have been reported thus far. 
Bates et al. (2013) estimate that they may comprise up to 10% 
of the pulsar population. 

The pulse profile of PSR J2007+2722 is unusually broad: at 
1500 MHz the full pulse width between the outer half-maxima 
is ^ 224°. The folded pulse profiles at the four GBT ob- 
served frequencies are shown in Figure 14. All observations 
exhibit a double-peaked pulse profile with an emission bridge 
between and connecting the two peaks. The emission bridge 
flattens with increasing observation frequency and shifts loca- 
tion to from between the peaks at lower frequency to outside 
the peaks at 8900 MHz. This indicates that some radio emis- 
sion is present at all rotational phases in addition to the pulsed 
emission. 

For all frequencies at which the pulsar was detected, pulse- 
averaged flux densities were obtained from the GBT obser- 
vations; in combination with the flux density from the NVSS 
catalog and the VLA archival data, the pulsar's flux density 
has been measured at six different frequencies. Table 5 sum- 
marizes these measurements. 

Note that the flux density measurements from the GBT ob- 
servations are only sensitive to the pulsed emission. Fitting a 
single-component power law 

to measurements of the pulsed flux density S at frequencies 
u >l GHz, we obtain a spectral index ^ = —1.12(6), i.e. a 
relatively flat spectrum (Lorimer & Kramer 2004). The non- 
detection at 430 MHz may indicate an unusual low-frequency 

http : / /psr chive . source forge . net 
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Figure 14. The pulse profile at (starting from the top) 8900, 2000, 1500, 
and 820 MHz. The calibrated pulsed flux is given in Table 5. All the plots 
show an emission "bridge" between the two pulses, which shifts to outside 
the peaks at the highest frequency. This is evidence that the pulsar is "always 
on". Hence the pulsed flux shown in Table 5 is only a fraction of the total 
flux. 

turnover in the spectrum. 

POLARIMETRY 

The GBT observations also provided full Stokes polariza- 
tion parameters I,Q,U, and V at 1500 and 2000 MHz, from 
which the polarization angle 



^=-arctan^- 



(19) 



can be computed as a function of the pulsar rotation phase. 
These polarization-angle profiles are shown in Figure 15. The 
measurement errors Aijj were also estimated as a function of 
pulsar rotation phase, but are not shown. The observations at 
8900 GHz were too noisy for further processing. 

EMISSION GEOMETRY 

The polarization-angle profiles can be used to infer the 
beam geometry from the Rotating Vector Model (RVM; Rad- 
hakrishnan & Cooke 1969). In this model, the beam geometry 
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Figure 15. Full Stokes polarization-angle profiles at 1500 MHz (top) and 
2000 MHz (bottom) taken at GBT. The horizontal axis is rotation phase of 
the pulsar. The bottom half of each plot shows the radi o flux-den sity S in 
intensity / (solid), linearly polarized component L = \JU^ + (dashed) 
and circularly polarized component V (dash-dotted). The top half of each 
plot shows the derived polarization angle from Equation (19), corrected 
for Faraday rotation arising from the Galactic magnetic field. The correction 
has Rotation Measure RM = —230; by convention i\) has been shifted by a 
constant to give the polarization angle V^oo at infinite frequency. The dashed 
lines show V^rvm for the best-fit rotating vector model from Sec. 6.2. 

(with respect to the observer) is defined by four parameters. 
For these, the RVM predicts the polarization angle V^rvm as a 
function of the pulsar's rotation phase ^ as 



tan (?/;] 



RVM 



(20) 



sin (^ — 0o) sin (a) 



sin (C) cos (a) — cos ((") sin (a) cos ((/) — (/)o) 

Here we follow the sign conventions for angles^ ^ given in 
Lorimer & Kramer (2004): a is the angle between the mag- 
netic and the spin axes, /3 is the angle between the line of sight 
and the magnetic axis, C = <^ + and is the polarization 
angle at pulsar rotation phase . 

The RVM model contains four free parameters: a, V^o 
and ^o; the best-fit parameter values were determined by a 
least- squares method. We began with the measurements of the 
polarization angle i\) for A^isoo = 158 different values of the 
rotation phase at 1500 MHz, and for A/'2ooo = 143 different 
values of the rotation phase at 2 GHz, as shown in the upper 
parts of Figure 15. At each point of a 4-dimensional cubical 
grid (spacing 0.5°) in (a, C, V^o, 0o) -space, we calculated the 

Everett & Weisberg (2001) use a different parameters ot' , /3^ given by 
ot' = 180° — a and /3' = —/3. A "primed" copy of Equation 20 then holds, 
however for a polarization angle V^^y^ = — V^rvm, whose sign is opposite to 
that of this paper. 
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Figure 16. The reduced chi-squared values as a function of (a, Q, ob- 
tained by fitting the measured polarization angle to the RVM model Equa- 
tion (20) as described in the text. At each point the was minimized over 
00 and ipQ. The dashed lines are the contours of constant emission-cone 
half -opening-angle as defined by Equation (21). 

normalized sum of the squared-residuals, 

2 _ 1 - V^RVM(</>i))^ 

between the RVM-predicted and measured polarization an- 
gles. Here i labels the N = Ni^qq or N = A/'2ooo distinct 
pulsar rotation phases 0^ for which ipi = was mea- 

sured, and Aipi is the experimental measurement uncertainty 
in tpi. 

Because the number of degrees of freedom is — 4, 
is a conventionally-normalized reduced-chi-squared statistic. 
Values of near unity indicate that RVM fits the data well 
(consistent with Gaussian-distributed errors of with standard 
deviation Aip in the values of ip) and large values of in- 
dicate a poor fit. Figure 16 shows the minimum value of 
as a function of (a, C)^ note that the color code has a loga- 
rithmic scale. The minimum values obtained over all four 
parameters, and the corresponding best-fit parameter values, 
are shown in Table 6. These best-fit values are shown by black 
crosses in Figure 16. 

The corresponding best-fit polarization-angle profiles are 
displayed by dashed lines in the top panels of Figure 15. The 
fit is acceptable in the sense that it is not untypical when com- 
pared with other radio pulsars. Overall, the RVM reproduces 
the form of the observed profile, especially at 1500 MHz, but 
leaves unmodeled structure below pulse phase 0.2 and above 
pulse phase 0.9. The largest deviations are at 2000 MHz be- 
low phase 0.25. Nevertheless it is encouraging that the inde- 



Frequency 


a 


/5 




-00 




1500 MHz 


68.3(5)° 


7(1)° 


192(2)° 


-12.7(8)° 


3.13 


2000 MHz 


64.9(8)° 


5(1)° 


202(3)° 


-5(2)° 


3.74 



Table 6 

The best-fit RVM parameters for PSR J2007+2722 obtained from fitting the 
model in Equation (20) to the measured polarization angle as a function of 
pulsar rotation phase, is the minimum reduced-chi-squared value, and 
the numbers in parentheses are the estimated 1-sigma errors. 



pendent fits at 1500 and 2000 MHz lead to very similar beam 
geometry parameters, and surprisingly tight bounds on their 
values, as shown in Table 6. 

However the fit can not be characterized as good; the de- 
viations between data and model that are visible in Figure 15 
give rise to reduced-chi-squared values that have very low 
statistical likelihood of being explained by the polarization- 
angle measurement errors. The failure to fit the RVM very 
well may arise because the pulsar doesn't ever "shut off" but 
is emitting over it's entire rotation. This can affect the po- 
larimetry; in Figure 15 one can see regions where the intensity 
L of the linearly-polarized component is greater than the total 
intensity /. This can not happen in nature; the inconsistency 
probably indicates that some aspect of the polarimetry mea- 
surement can not be trusted. It could well be an artifact of not 
being able to identify the uniform level of flux corresponding 
to zero pulsed emission. However the lack of a good fit is also 
consistent with the interpretation that PSR J2007+2722 is a 
disrupted recycled pulsar: many recycled pulsars are not well- 
fit by the basic RMV (Thorsett & Stinebring 1990; Navarro 
et al. 1997; Xilouris et al. 1998; Stairs et al. 1999). 

One can infer the opening-angle of the radio emission-cone 
from the RVM parameters together with the observed sep- 
aration between the pulse peaks. The emission-cone half- 
opening-angle p is related to the measured separation W of 
the pulse peaks by 

cos (p) = cos {a) cos {() + sin (a) sin {() cos ( ^ j • (^1) 

At 1500 MHz we estimate a peak-to-peak width of the pulse 
profile 1^1500 = 163°; at 2000 MHz I^sooo = 171°. For 
these values of W, the dashed lines in Figure 16 show con- 
tours of constant emission-cone half-opening-angle p as a 
function of a and (. Using the best-fit aX values from 
Table 6 we obtain radio-emission-cone half-opening-angles 
P1500 = 77° and pisoo = 78° at 1500MHz and 2000 MHz, 
respectively. 

6.3. Timing Model 

A timing model for PSR J2007+2722 has been found us- 
ing two distinct data sets, obtained at the Arecibo Observa- 
tory and at Jodrell Bank. The Arecibo data were collected in 
two short (268 s) survey observations on 11 and 16 Febru- 
ary 2007; the first of these provided the data used in the Ein- 
stein@Home discovery. The Jodrell data were collected in 75 
targeted observations between 15 July 2010 and 30 Novem- 
ber 2012, starting soon after the discovery. To obtain a timing 
model, the observational data were first reduced to a set of 
TOAs. 

The Arecibo data (described earlier) covering a 100 MHz 
bandwidth centered at 1452 MHz, were used to construct 
TOAs in four distinct 25 MHz frequency bands. A model 
pulse profile was used to obtain 22 distinct TOAs. 
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The Jodrell Bank observations used a dual-polarization 
cryogenic receiver on the 76-m Lovell telescope, having a sys- 
tem equivalent flux density of 25 Jy on cold sky. Observations 
typically lasted 20 or 30 minutes. Data were processed using a 
digital filterbank which covered a bandwidth of 350 MHz cen- 
tered around 1525 MHz in channels of 0.5 MHz bandwidth. 
The data were folded at the nominal topocentric period of the 
pulsar for sub-integration times of 10 seconds. After inspec- 
tion and removal of any RFI, the profiles were de-dispersed 
and sunmied over frequency and time to produce integrated 
profiles. For each observation, a single TOA as obtained by 
cross-correlation of the profile with a standard template using 
standard analysis tools from P SRC HIVE. 

The 97 distinct TOAs were analyzed using the TEMP02 
software package (Hobbs et al. 2006; Edwards et al. 2006). 
In the fitting procedure to determine the pulsar parameters, a 
single adjustable offset time was introduced between the two 
data sets. This free parameter is called a "jump" in TEMP02; 
it is necessary because different model pulse profiles were 
used to derive the Arecibo and Jodrell TOAs. It also avoids 
the complications inherent in getting absolute time synchro- 
nization between the two observatories. 

The parameters of PSR J2007+2722 obtained from this 
TEMP02 analysis are shown in Table 7; the resulting fitting 
residuals are shown in Figure 17. The fit is remarkably good: 
the residuals have a weighted rms of 66 /iS and the reduced 
= 1.059 is very close to unity. In pulsar astronomy it is 
Standard practice to re-scale the uncertainties by the square- 
root of this value; we have done that here, but it only changes 
the estimated one-sigma errors by about 3%. 

The pulsar parameters obtained by timing (sky position, 
frequency, and spindown) are reasonably consistent with the 
announcement paper (Knispel et al. 2010) published one 
month after the discovery^^. That paper gave the sky position 
(found as described in Section 6.1) as RA 20h 07m 15.77s, 
DEC 27°22M7.7'' with errors less than order 1 arcsecond. 
The position found here is consistent with that. The dis- 
covery paper gave the frequency (at MJD 55399) as / = 
40.820677620(6) Hz. The frequency found here is about one 
standard deviation outside of that range; this may have been 
due to our lack of knowledge about the precise spin-down 
rate. Finally, the discovery paper only gave a bound on the 
spin-down rate, of |/| < 3 x 10~^^/s^. Here, with a much 
longer observational data set, the spin-down has been deter- 
mined to be consistent with that: / = — 1.6xl0~^^/s^. This 
corresponds to a characteristic age — //2/ = 404 Myr, an in- 
ferred surface dipole magnetic field strength of 4.9 x 10^ G, 
and a spin-down luminosity £^ = 2.6 x 10^^ erg/s (assuming 
the canonical moment-of-inertia / = 10^^ g cm^). 

6.4. Multiwavelength Electromagnetic Counterparts 

With the final sky position given in Table 7, we searched for 
electromagnetic counterparts at different wavelengths. The 
pulsar is not in any known globular cluster or near a cata- 
loged supernova remnant (Green 2009). We then checked 
infrared, gamma-ray and X-ray catalogues for counterparts. 
Infrared: The nearest sources visible in infrared images (J, 
H, K-band) obtained from the Two Micron All Sky Survey 
(2MASS, Skrutskie et al. (2006)) are more than 13'' distant 
from the pulsar position. Gamma-ray: No counterpart was 

To facilitate comparison, Table 7 specifies the pulsar's parameters at the 
same epoch as Knispel et al. (2010), rather than at the (more conventional) 
midpoint of the observational sample. 
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Figure 17. Timing residuals obtained by fitting a timing model to TOA data 
from Arecibo Observatory (taken in February 2007) and TOA data from Jo- 
drell Bank (taken between July 2010 and November 2012). The horizontal 
axis is the date of the TOA observation, and the vertical axis is post-fit resid- 
uals in seconds. 



Fit and data- set 

Pulsar name JJ2007+2722 

MJD range 54142.7—56261.4 

Number of TOAs 97 

Rms timing residual {jis) 65.6 

Weighted fit Y 

Reduced value 1.057 

Measured Quantities 

Right ascension, a 20:07:15.8288(4) 

Declination, 5 -^27:22:47.914(6) 

Pulse frequency, v {^-^) 40.820677605083(15) 

First derivative of pulse frequency, z>(s~^) — 1.60 15(4) xl0~^^ 

Dispersion measure, DM (cm~^pc) 127.0(4) 

Set Quantities 

Epoch of frequency determination (MJD) 55399 

Epoch of position determination (MJD) 55399 

Epoch of dispersion measure determination (MJD) 55399 
Derived Quantities 

log (Characteristic age, yr) 8.61 

log (Surface magnetic field strength, G) 9.69 

log (Canonical spin-down luminosity, erg/s) . . . 33.4 
Assumptions 

Clock correction procedure TT(TAI) 

Solar system ephemeris model DE405 

TDB units (tempo 1 mode) Y 

FB90 time ephemeris (tempo 1 mode) Y 

T2C (tempo 1 mode) Y 

Shapiro delay due to planets N 

Electron density at 1 AU (cm~^) 9.96 

TEMPO model version number 2.00 



Table 7 

The parameters describing PSR J2007-I-2722 obtained by timing analysis of 
data spanning about six years. Figures in parentheses are the nominal la 

TEMP02 uncertainties in the least-significant digits quoted. For easy 
comparison, the Epoch has been chosen to be the same as Knispel et al. 
(2010) rather than at the midpoint of the observational interval. 

found in the second Fermi Large Area Telescope Point Source 
Catalog (Nolan et al. 2012). X-ray: There are three archival 
Chandra X-ray observations^^ ; from these, no X-ray counter- 
part could be identified. We then carried out more detailed 
followups starting from the raw gamma-ray and X-ray data as 
described below. 

Since the launch of Fermi in 2008, the on-board Large Area 
Telescope (LAT) (Atwood et al. 2009) has observed pulsa- 
tions from more than 120 pulsars^"^, and new blind-search 
methods similar to those used in this paper are finding even 
more (Pletsch et al. 2012b,c,a). The LAT has also con- 

http : / /heasarc . gsf c . nasa . gov/ docs /archive . html 
See https://confluence.slac.stanford.edu/ 
display /GLAMCOG/Public+List+of+LAT-Detected+ 
Gamma- Ray+Pulsars/ 
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Figure 18. The population of known radio pulsars, plotted as a function of 
spin-period (horizontal axis) and rate of change of the spin-period with time 
(vertical axis). PSR J2007-I-2722 is at the intersection of the dotted lines: a re- 
gion populated almost exclusively by old recycled pulsars in binary systems, 
indicated by circled points. In contrast to PSR J2007-I-2722, almost all iso- 
lated pulsars (uncircled points) are in the region populated by much younger 
non-recycled systems. 

firmed that many radio-detected, both normal and milHsec- 
ond, pulsars are emitting rotation-phase- synchronous gamma- 
rays (Abdo et al. 2009; Ray et al. 2012). So, we here consider 
the possibility of PSR J2007+2722 also being a gamma-ray 
pulsar. 

Unfortunately the characteristics of PSR J2007+2722 make 
it an unlikely source for gamma-ray emissions or pulsations, 
when comparing to the known ganmia-ray pulsar population 
Abdo et al. (2013). Its spin-down power ^ = 2.6 x 10^^ erg/s 
is near the lower end of the known gamma-ray pulsar popula- 
tion, and at a distance ofd = 5.4 kpc, the spin-down flux den- 
sity E/d'^ is smaller than that of any known gamma-ray emit- 
ting pulsar by a factor of a few. In addition, PSR J2007+2722 
is in a high-background region close to the Galactic plane. 
The Fermi-LKT Second Source Catalog (Nolan et al. 2012) 
does not contain any source positionally overlapping with the 
pulsar's location. 

Nevertheless, we searched the LAT data for gamma- 
ray pulsations synchronous with the radio-pulse rotation 
phase. We extracted the LAT photons within 2 degrees of 
PSR J2007-f2722's sky position from the start of data tak- 
ing in August 2008 up to January 2013. We folded them 
for different cuts on minimum energy (between 40 MeV and 
0.8 GeV) and different angular cuts (between 0.5 and 2 de- 
grees). There was no sign of a signal; the LAT does not detect 
gamma-ray pulsations from PSR J2007+2722. In principle, 
one could carry out a spectral analysis of the region and con- 
struct a source model for PSR J2007+2722 to assign proba- 
bility weights to the LAT photons as in Pletsch et al. (2012b). 
However, given the extremely low pulsation significance of 
the unweighted fold, we concluded this was unlikely to make 
much of a difference. 

6.5. X-ray limits, and the nature of PSR J2007-\-2722 

As shown in Figure 18, timing measurements of 
PSR J2007+2722 place it in a region of the (P, P)-diagram 
normally occupied by old neutron stars in binary systems 
spun-up due to accretion torques (i.e. "recycled"). These 



pulsars naturally have shorter periods (P ^ 100 ms) than 
the younger, isolated rotation-powered pulsars and are con- 
strained to lie below the spin-up limit for recycled pulsars 
P(ms) = 1.9(P/10^ G)^/^ (van den Heuvel 1987), where 
the magnetic field restricts the minimal achievable rotation 
period. 

Together with the lack of a stellar companion at any wave- 
length or unmodeled systematics in the timing residual to in- 
dicating otherwise, there is no evidence that PSR 12007-^2722 
is currently part of a binary system. Instead, its moderately 
short period suggests that it was partially recycled and is pos- 
sibly a disrupted recycled pulsar (DRP). These isolated NSs 
are born in a binary system and become unbound by a second 
supernova event involving the companion; they are defined in 
Belczynski et al. (2010) as isolated radio pulsars in the Galac- 
tic disk with magnetic field strength |P| < 3 x 10^^ G and 
spin-frequency / < 50 Hz. Their evolutionary origin explains 
their location on the region of the (P, P)-diagram which is 
populated by weak magnetic field pulsars, whose fields have 
decayed over ~ 10^ yr. The work by Belczynski et al. (2010) 
describes the 12 DRPs known at the time of publication; one 
more (PSR J 182 1+0 155) has subsequently been discovered 
(Rosen et al. 2012). PSR J2007+2722 would be the 14th and 
most rapidly spinning member of this class. 

DRPs are an enigma: standard evolutionary models for bi- 
nary systems cannot easily explain the observed ratio of iso- 
lated recycled pulsars relative to the number of double neu- 
tron star systems (Belczynski et al. 2010). The models pre- 
dict about one double neutron star system for every ten DRPs, 
but roughly equal numbers are observed. Furthermore, there 
is no independent evidence that all isolated pulsars overlap- 
ping the binary population are actually derived from bina- 
ries. Indeed, recent observations of manifestly young pulsars 
in supernova remnants reveal that neutron stars can be born 
with anomalously low surface dipole magnetic fields of order 
B - 10^^ G (see Gotthelf et al. 2013b for details). These 
so-called anti-magnetars occupy an overlapping region in the 
(P, P)-diagram with the DRPs and therefore suggest that their 
descendants might be mis-identified as DRPs (Gotthelf et al. 
2013a). If in fact PSR J2007+2722 is a young object instead 
of a ^ 10^ yr-old DRP, neutron star cooling curves predict 
that thermal X-ray surface emission should be observable for 
up to 1 Myr (Page et al. 2009), long after its supernova rem- 
nant has dissipated. After this time, the internal temperature 
drops rapidly and thermal emission becomes negligible. 

To investigate the possibiUty that PSR J2007+2722 might 
be a young, hot object, we examined fortuitous archival X- 
ray observations covering the location of the pulsar. A total 
of 95 ks of good Chandra ACIS-I data are available as data 
sets ObsIDs 6438, 7254 and 8492, acquired on 2006 Decem- 
ber 10, 2007 January 07 and 29, respectively (Giinther et al. 
2012). The expected location of the pulsar falls 6' off-axis 
for each observation, where the point response function of the 
telescope is degraded to (99% enclosed energy fraction). 
Within the nominal absolute astrometry error of 0^6 radius no 
X-ray source is found that overlaps with the subarcsec pul- 
sar coordinates presented herein. As shown in Figure 19, the 
closest source is 14'.'3 away from PSR J2007+2722. 

To attempt to place a lower limit on the age of 
PSR J2007+2722 we use the Chandra data to determine the 
minimum detectable flux expected from a cooling neutron star 
of radius = 14 km at the DM derived distance of 5.4 kpc. 
Following the method described in Gotthelf et al. (2013a), we 
compute an upper-limit on the number of expected counts for 
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a non-detection at the 99.73% confidence level (3a). Based 
on the local background rate of 1.6 x 10~^ cps in the r = 5" 
aperture, we require 6.5 photons from the pulsar in the com- 
posite ACIS-I observation in the 0.3 — 2 keV energy band at 
the off-axis pulsar location. We convolve an absorbed black- 
body spectrum with the telescope response function generated 
for these observations and integrate over the energy band to 
compute the detected number of counts as a function of tem- 
perature. The blackbody normalization is fixed to the ratio of 
the neutron star radius to its distance and the column density 
is set to Nn ?^ 4 X 10^^ cm~^, estimated from the DM and 
by assuming a rule-of-thumb N^/Nn ~ 0.1. This procedure 
yields a temperature of kT ^ 69 eV and bolometric luminos- 
ity of L(bol) ^ 6 X 10^^ erg s~^ implying a lower limit on 
the neutron star cooling age ^ 5 x 10^ yr. This is an order-of- 
magnitude older than would be expected for a typical young 
neutron star, according to the range of cooling curve models 
(Page et al. 2009). 

The uncertainty in this upper limit is difficult to estimate. 
The contribution from the unknown column density depends 
on the uncertainty in the Galactic electron density distribu- 
tion, estimated as 20% in §5. A recent calibration of the ra- 
tio Ne/Nn shows over an order-of-magnitude scatter in this 
relationship (He et al. 2013). If A^h varied by an order-of- 
magnitude away from our assumed value, then the lower lim- 
its on the age could be as small as 10^ yr. Moreover, the 
effects of any uncertainty on A^h are amplified because the 
derived temperature falls at the edge of the ACIS-I response 
function where the detector sensitivity falls off rapidly. 

It appears unlikely that PSR J2007+2722 is a young pul- 
sar, but current data cannot prove that it was formed through 
recycling in a binary system versus being simply an isolated 
pulsar born with a low magnetic field. For a typical rotation- 
powered pulsar emitting non-thermal X-rays with power-law 
spectrum of photon index F = 1.5, the 2 — 10 keV luminos- 
ity upper limit for PSR 12007-^2722 is 2.2 x 10^^ erg s-^ 
However, based on its spin-down energy of E = 2.58 x 10^^ 
erg s~^, the predicted X-ray luminosity in this band is only 
Lx = 2.7 X 10^^ erg s'^ (Possenti et al. 2002). So no definite 
constraint is possible. 

7. DISCUSSION AND CONCLUSIONS 

The Einstein® Home project continues to analyze three dif- 
ferent types of data: from gravitational- wave detectors, from 
radio telescopes, and most recently from the Fermi gamma- 
ray telescope. To date, Einstein® Home has discovered al- 
most fifty radio pulsars using the methods described here. 
Some of these discoveries have already been published (Knis- 
pel et al. 2011, 2013) and others are forthcoming. The search 
of PALFA data will continue as the survey progresses. In the 
future we plan to search survey data from the Effelsberg tele- 
scope and perhaps also data from the High Timing Resolution 
Universe (HTRU) survey. 

More than two years after the first Einstein@Home radio 
pulsar discovery, we have given a detailed description and full 
timing model for PSR J2007-F2722. It is satisfying to see that 
the complex set of deductions used in the days after the ini- 
tial discovery, using a combination of new observational data 
and cataloged survey data, did indeed correctly identify the 
pulsar's position. Evidence from the polarization studies and 
from the definitive measurement of the pulsar's spin-down 
rate can not prove that it is a disrupted recycled pulsar, but 
support this hypothesis. 

In the future, Volunteer Distributed Computing might play 
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Figure 19. Chandra ACIS X-ray image (0.3 — 2 keV) of the field containing 
PSR J2007+2722, whose location is marked by the cross. The field-of-view 
is 3' X 3^ The nearest resolved point source is 14^.^3 away. 

an even larger role in radio astronomy. For example to carry 
out a complete pulsar survey around the end of this decade, 
using data from the upcoming Square Kilometer Array (SKA) 
will requires Exaflop computing resources (Smits et al. 2009). 
We expect that this will be pushing "state of the art" in 
computing and thus will be challenging and expensive. But 
based on reasonable extrapolations about consumer comput- 
ing hardware, several million volunteers should be able to pro- 
vide those compute cycles at very low cost to the scientific 
community or funding agencies (Allen et al. 2013). 

Volunteer Distributed Computing might also provide a 
novel solution for SKA data storage (Allen et al. 2013). As 
currently planned, SKA will have 0(1000) antennae, each 
producing data at approximately Gb/s rates, yielding a total 
raw data volume of some Tb/s. This rate so large enough that 
it is planned to store this raw data for only a few hours until 
it is processed and then replaced with fresh data. In contrast. 
Volunteer Distributed Computing might permit all SKA data 
to be stored /(9r^v^r, broadening the range of scientific work 
that could be carried out. This is possible because the capac- 
ity of the public Internet is anticipated to continue growing by 
40% annually at least through the end of the decade. Simi- 
larly, the capacity of personal computer storage devices is ex- 
pected to continue to grow exponentially, reaching ^ 50 TB 
by the end of the decade. It is therefore sufficient if several 
million volunteers provide a fraction of their storage capac- 
ity; existing file-sharing and replication techniques could pro- 
vide a statistical guarantee of retrievability and validity. The 
key requirement is that SKA have a Tb/s network connection 
to the public Internet, presumably in a major city. From that 
point on, the data rates would represent only a small fraction 
of the public Internet capacity^^ . 

Extrapolating a few years into the future, we expect that 
laptop and desktop computers will provide a decreasing frac- 

For example the Amsterdam Internet Exchange (February 2013) handles 
average traffic volumes of 1.5 Tb/s with peaks over 2 Tb/s: https : / / www . 
ams-ix.net/statistics . 
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tion of the compute cycles available from volunteers. Based 
on market trends, we expect to reap these cycles from tablets, 
E-book readers, and smartphones. These devices have Inter- 
net connectivity and typically spend a substantial fraction of 
time plugged into charging devices. During this time, they can 
provide a very large number of compute cycles. Although the 
devices provide less compute power than conventional CPUs, 
they are extremely power-efficient and are being sold in very 
large numbers. 

In short, we believe that the approach described here is not 
a fad, and will provide a substantial computing resource for 
astronomy in the long term. 
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